专利摘要:
METHODS FOR DETERMINING AT LEAST A PORTION OF THE GENOME OF A UNBORNED FEMALE FROM A PREGNANT FEMALE, TO IDENTIFY A MUTATION OF A NEW FIRM GENOME FROM A PREGNANT FEMALE, TO DETERMINING A FRACTIONAL CONCENTRATION OF A FETAL DNA IN A FETAL DNA A PREGNANT FEMALE AND TO DETERMINE A PROPORTION OF A FETAL GENOME THAT WAS SEQUENCED FROM A BIOLOGICAL SAMPLE TAKEN FROM A PREGNANT FEMALE, AND, MEANS READABLE BY A NON-TRANSITIONAL COMPUTER. Systems, methods, and apparatus for determining at least a portion of the fetal genome are provided. DNA fragments from a maternal sample (maternal and fetal DNA) can be analyzed to identify alleles at certain locations. The amounts of DNA fragments from the respective alleles at these sites can be analyzed together to determine relative amounts of the haplotypes for these sites and to determine which haplotypes were inherited from the parents' genomes. Locations where the parents are a specific combination of homozygous and heterozygous can be analyzed to determine regions of the fetal genome. The reference haplotypes common in the population can be used together as the analysis of DNA fragments from the maternal sample to determine the maternal and paternal genomes. (...).
公开号:BR112012010694B1
申请号:R112012010694-5
申请日:2010-11-05
公开日:2020-11-17
发明作者:Yuk Ming Dennis Lo;Kwan Chee Chan;Wai Kwun Rossa Chiu;Charles Cantor
申请人:Chinese University Of Hong Kong;Sequenom, Inc.;
IPC主号:
专利说明:

CROSS REFERENCES TO RELATED REQUESTS
[001] This application claims priority for and is a non-provisional application for U.S. Provisional Application No. 61/258567, entitled "Fetal Genomic Analysis" filed on November 5, 2009; U.S. Provisional Application No. 61/259075, entitled “Fetal Genomic Analysis from a Maternal Biological Sample” filed November 6, 2009; and U.S. Provisional Application No. 61/381854, entitled “Fetal Genomic Analysis from a Maternal Biological Sample” filed on September 10, 2010, the total contents of which are incorporated herein by reference for all purposes.
[002] This application is also related to U.S. Application No. 12 / 178,181, entitled “Diagnosing Fetal Chromosomal Aneuploidy Using Massively Parallel Genomic Sequencing” filed on July 23, 2008 (Representative Certificate Ne 016285-005220US); US Application Ne 12/614350, entitled “Diagnosing Fetal Chromosomal Aneuploidy Using Genomic Sequencing With Enrichment,” (Representative Certificate N- 016285-005221US), and concurrently filed with the US application entitled “Size-Based Genomic Analysis” - 016285- 006610US), the total contents of which are incorporated herein by reference for all purposes. FUNDAMENTALS
[003] The present invention in general concerns the analysis of a fetal genome based on a maternal sample, and more particularly the determination of all or parts of the fetal genome based on an analysis of genetic fragments in the maternal sample.
[004] The discovery of fetal nucleic acids free of maternal noplasm cells in 1997 has made new possibilities accessible regarding non-invasive prenatal diagnosis (Lo YMD et alLancet 1997; 350: 485- 487; and US Patent 6,258,540). This technology was rapidly translated into clinical applications, with the detection of genes or sequences derived from the fetus, inherently inherited, for example for the determination of the fetal sex, determination of the fetal RhD status, and determination of whether the fetus inherited a patently inherited mutation (Amicucci P et al Clin Chem 2000; 46: 301-302; Saito H et alLancet 2000; 356: 1170; and Chiu RWK et alLancet 2002; 360: 998-1000). Recent progress in the field has enabled the prenatal diagnosis of fetal chromosomal aneuploidies, such as trisomy 21, from the analysis of maternal plasma nucleic acid (Lo YMD et al Nat Med 2007; 13: 218-223; Tong YK et al Clin Chem 2006; 52: 2194-2202; US Patent Publication 2006/0252071; Lo YMD et al Proc Natl Acad Sei USA 2007; 104: 13116-13121; Chiu RWK et alProc Natl Acad Sei USA 2008; 105: 20458- 20463; Fan HC et alProc Natl Acad Sei 2008; 105: 16266-16271; US Patent Publication 2007/0202525; and US Patent Publication 2009/0029377).
[005] Another area of significant recent progress is the use of single molecule counting methods, such as digital PCR, for noninvasive prenatal diagnosis of single gene diseases in which the mother and father both carry the same mutation . This was achieved by analyzing relative mutation dosage (RMD) in maternal plasma (US Patent Application 2009/0087847; Lun FMF et alProc Natl Acad Sei USA 2008; 105: 19920-19925; and Chiu RWK et al.Trends Genet 2009 ; 25: 324-331).
[006] However, such methods use previous knowledge of possible mutations to analyze specific parts of a genome, and thus may not identify latent or unusual genetic mutations or diseases. Therefore, it is desirable to provide new methods, systems, and apparatus that can identify all or parts of a fetal genome using noninvasive techniques. BRIEF SUMMARY
[007] Certain embodiments of the present invention can provide methods, systems, and apparatus for determining at least a portion of the unborn fetus genome of a pregnant woman. A genetic map of the entire genome or for the selected genomic region (s) can be constructed prematurely from the fetus using a sample containing fetal and maternal genetic material (for example from a blood sample from the pregnant mom). The genetic map can be the sequence that a fetus inherited from both its father and mother, or just that of a parent. Based on one or more of such genetic maps, the risk that the fetus would be suffering from a disease or genetic predisposition to a genetic or other disease or genetic trait can be determined. Another application of the embodiments is also described here.
[008] In one embodiment, DNA fragments from a maternal sample (containing maternal and fetal DNA) can be analyzed to identify alleles at certain specified loci (landmarks). The amount of DNA fragments of the respective alleles at these loci can then be analyzed together to determine the relative amounts of haplotypes for these loci and thereby determine which haplotypes were inherited by the fetus from the maternal and / or paternal genomes. By identifying fetal haplotypes, the fetal genotype at an individual locus within the corresponding genomic region including the specified loci can be determined. In various embodiments, the loci where the parents are a specific combination of homozygous and heterozygous can be analyzed in a way to determine the regions of the fetal genome. In one implementation, reference haplotypes that are representative of common haplotypes in the population are used together with the analysis of DNA fragments from the maternal sample to determine the maternal and patemo genomes. Other embodiments are also provided, such as determining mutations, determining a fractional fetal concentration in a maternal sample, and determining a span proportion of a sequencing of the maternal sample.
[009] Other embodiments of the invention are directed to ecosystems, apparatus, and computer-readable media associated with methods described herein. In one embodiment, the computer-readable medium contains instructions for receiving data and analyzing data, but not instructions for directing a machine to create the data (for example, the sequencing of nucleic acid molecules). In another embodiment, the computer-readable medium contains instructions for directing a machine to create the data. In one embodiment, a computer program product comprises a computer-readable medium that stores a plurality of instructions for controlling a processor to perform an operation for the methods described herein. The embodiments are also directed to the computer systems configured to perform the steps of any of the methods described here, potentially with different components that perform a respective step or a respective group of steps.
[0010] Reference to the remaining portions of the specification, including the drawings and claims, will realize other features and advantages of the embodiments of the present invention. Other features and advantages, as well as the structure and operation of various embodiments of the present invention, are described in detail below with respect to the accompanying drawings. In the drawings, identical reference numbers may indicate identical or functionally similar elements. BRIEF DESCRIPTION OF THE DRAWINGS
[0011] FIG. 1 is a flow chart of a method 100 of determining at least a portion of the genome of an unborn fetus of a pregnant woman according to the embodiments of the present invention.
[0012] FIG. 2 shows two haplotypes for the father and two haplotypes for the mother for a particular segment of their respective genomic code according to the embodiments of the present invention.
[0013] FIG. 3 shows the two types of SNPs in the parental haplotypes of FIG. 2 according to the embodiments of the present invention.
[0014] FIGS. 4A and 4B show an analysis to determine fetal haplotypes for the two types of SNPs according to the embodiments of the present invention.
[0015] FIGS. 5A and 5B show the analysis of comparing relative quantities (e.g. counts) of fragments for each locus and whether a result of the comparison is to classify a particular haplotype as being inherited or not according to the embodiments of the present invention.
[0016] FIG. 6 illustrates the effect of changing the probability ratio for the classification of SPRT according to the embodiments of the present invention.
[0017] FIG. 7 is a flowchart of a method 700 of determining at least a portion of the unborn fetus genome of a pregnant woman inherited from the father according to the embodiments of the present invention.
[0018] FIG. 8 is a flow chart of a method 800 for determining at least a portion of the genome of an unborn fetus in a region where the mother and father are heterozygous according to the embodiments of the present invention.
[0019] FIG. 9 shows haplotypes of a father and mother who are both heterozygous in a particular genomic region according to the embodiments of the present invention.
[0020] FIG. 10 is a flow chart illustrating a method 1000 for determining the fractional concentration of fetal material in a maternal sample according to the embodiments of the present invention.
[0021] FIG. 11 is a flow chart of a method for determining whether a locus is informative according to the embodiments of the present invention.
[0022] FIG. 12A and 12B show the predictable distribution of counts for the T allele (the least abundant allele at junctions (a) and (c)) for the three junctions with an assumed fractional fetal DNA concentration of 20% and 5%, respectively. according to the embodiments of the present invention.
[0023] FIGS. 13A, 13B, and 14 show the predicted distributions for the counts of the least abundant allele for a fractional fetal DNA concentration of 20%, each for different total counts of molecules that correspond to a SNP according to the embodiments of the present invention.
[0024] FIG. 15A and 15B show examples of reference haplotypes, parental haplotypes taken from the reference haplotypes, and a resulting fetal haplotype according to the embodiments of the present invention.
[0025] FIG. 16 is a flow chart of a method 1600 for determining at least part of a fetal genome when a set of reference haplotypes are known, but parental haplotypes are not known, according to the embodiments of the present invention.
[0026] FIG. 17 shows an example of determining informational loci from the analysis of DNA fragments from a maternal sample according to the embodiments of the present invention.
[0027] FIG. 18 shows the three reference haplotypes (Hap A, Hap B and Hap C) and the paternal alleles.
[0028] FIG. 19 shows the determination of the parental haplotype of the patellas in accordance with the embodiments of the present invention.
[0029] FIG. 20 shows the deduction of maternal genotypes from the analysis of the maternal sample according to the embodiments of the present invention.
[0030] FIG. 21 shows an embodiment for determining the mathematic haplotypes from the mathematic genotypes and the reference haplotypes according to the embodiments of the present invention.
[0031] FIG. 22 shows the mathematically determined haplotypes and the patellarly inherited haplotype according to the embodiments of the present invention.
[0032] FIG. 23 shows the different types of loci (alpha (A) and beta (B)) for the maternal haplotypes relative to the patemo haplotype according to the embodiments of the present invention.
[0033] FIG. 24 is a flow chart illustrating a 2400 method of identifying a de novo mutation in the genome of an unborn fetus of a pregnant woman.
[0034] FIG. 25A shows the absolute number and percentages of SNPs showing different genotype combinations for the father, mother and fetus (CVS) according to the embodiments of the present invention.
[0035] FIG. 25B shows a table that lists the alignment statistics for the first 20 flow cells.
[0036] FIG. 26 is a table showing the fractional concentrations of fetal DNA calculated for SNPs via two methods according to the embodiments of the present invention.
[0037] FIG. 27A shows a plot illustrating the observed percentage of SNPs in this subset in which a fetal allele can be observed from the sequencing data for the first 20 flow cells analyzed, and FIG. 27B shows a plot of the coverage vs. the number of readings according to the present invention.
[0038] FIG. 28A and 28B show plots of the correlation between the range of patently inherited alleles and the number of mappable sequence readings and the number of flow cell sequences, respectively, according to the embodiments of the present invention.
[0039] FIG. 29A shows the correlation between the false positive rate and the number of flow cells sequenced, and FIG. 29B shows the correlation between the false positive rate and the number of flow cells sequenced according to the embodiments of the present invention.
[0040] FIG. 30 shows the scope of fetus-specific SNPs for different numbers of flow cells analyzed according to the embodiments of the present invention.
[0041] FIG. 31 shows the accuracy of Type A analysis when data from 10 flow cells were used in accordance with the embodiments of the present invention.
[0042] FIG. 32 shows the accuracy of Type B analysis when data from 10 flow cells were used in accordance with the embodiments of the present invention.
[0043] FIG. 33 shows the accuracy of Type A analysis when data from 20 flow cells were used in accordance with the embodiments of the present invention.
[0044] FIG. 34 shows the accuracy of Type B analysis when data from 20 flow cells were used in accordance with the embodiments of the present invention.
[0045] FIG. 35A and 35B show readings with one of the mutations and with a wild type sequence in codons 41/42 according to the embodiments of the present invention.
[0046] FIG. 36 shows a table from an RHDO Type A analysis whereas those from an RHDO Type B analysis are shown in FIG. 37 according to the embodiments of the present invention.
[0047] FIGS. 38A and 38B show the results of the SPRT classification for the case PW226 as an example.
[0048] FIG. 39 shows a table that summarizes the analysis of RHDO results for the five cases according to the embodiments of the present invention.
[0049] FIG. 40 shows a depth sequencing plot against the number of flow cells sequenced according to the embodiments of the present invention.
[0050] FIG. 41 shows a plot of the fetal and total sequence sizes for the entire genome, and FIG. 42A to 42C show similar plots individually for each chromosome according to the embodiments of the present invention.
[0051] FIG. 43 shows a block diagram of an exemplary computer system 4300 usable with the system and methods according to the embodiments of the present invention. DEFINITIONS
[0052] The term "biological sample" as used herein refers to any sample that is taken from an individual (for example, a human being, such as a pregnant woman) and contains one or more nucleic acid molecules of interest.
[0053] The term "nucleic acid" or "polynucleotide" refers to a deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) and a polymer thereof in the form of single or double filament. Unless specifically limited, the term encompasses nucleic acids containing known analogs of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a similar manner to naturally occurring nucleotides. Unless otherwise indicated, a particle nucleic acid sequence also implicitly encompasses conservatively modified variants of it (e.g., degenerate codon substitutions), alleles, orthologists, SNPs, and complementary sequence as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions can be obtained by generating a sequence in which the third position of one or more selected codons (or all) are replaced with mixed base and / or deoxyinosine residues (Batzer et al., Nucleic Acid Res. 19: 5081 (1991); Ohtsuka et al., J. Biol. Chem. 260: 2605-2608 (1985); and Rossolini et al., Mol. Cell. Probes 8: 91-98 (1994)). The term nucleic acid is used interchangeably with gene, cDNA, mRNA, small non-coding RNA, micro RNA (miRNA), RNA that interacts with Piwi, and short hairpin RNA (shRNA) encoded by a gene or locus.
[0054] The term "gene" means the segment of DNA involved in the production of a transcribed polypeptide or RNA product. It can include regions that precede and follow the coding region (leader and trailer) as well as intervening sequences (introns) between individual coding segments (exons).
[0055] The term "clinically relevant nucleic acid sequence" (also referred to as a target sequence or chromosome) as used herein may refer to a polynucleotide sequence that corresponds to a segment of a larger genomic sequence whose potential imbalance is being tested or to the larger genomic sequence itself. One example is the chromosome 21 sequence. Other examples include chromosome 18, 13, X and Y. Still other examples include mutated genetic sequence or genetic polymorphisms or copy number variations that a fetus can inherit from one or both of its parents. , or as a de novo mutation in the fetus. In some embodiments, multiple clinically relevant nucleic acid sequence, or equivalently multiple markers of the clinically relevant nucleic acid sequence, can be used to provide data to detect the imbalance. For example, data from five non-consecutive sequences on chromosome 21 can be used in an additive manner to determine possible imbalance on chromosome 21, effectively reducing the need for sample volume to 1/5.
[0056] The term "based on" as used herein means "based at least in part on" and refers to a value (or result) that is used in determining another value, as occurs in relationships of an input of a method and the output of this method. The term "derive" as used here also refers to the relationship of an input to a method and the output of this method, just as it does when the derivation is the calculation of a formula.
[0057] The term "parameter" as used herein means a numerical value that characterizes a set of quantitative data and / or a numerical relationship between the sets of quantitative data. For example, a ratio (or function of a ratio) between a first amount of a first nucleic acid sequence and a second amount of a second nucleic acid sequence is a parameter.
[0058] As used herein, the term "locus" or its plural form "loci" is a locus or address of any length of nucleotides (or base pairs) that vary across genomes.
[0059] The term "sequence imbalance" as used herein means any significant deviation as defined by at least one cutoff value in an amount of the clinically relevant nucleic acid sequence from a reference amount. A sequence imbalance can include chromosome dosage imbalance, allelic imbalance, mutation dosage imbalance, haplotype dosage imbalance, and other similar imbalances. As an example, an allelic or mutated dosage imbalance can occur when a fetus has a different genotype than the mother, thereby creating an imbalance at a particular locus in the sample.
[0060] The term "aneuploidy" as used here means a variation in the quantitative quantity of a chromosome from that of a diploid genome. The variation can be a gain or a loss. It can involve the whole of a chromosome or a region of a chromosome.
[0061] The term "haplotype" as used herein refers to a combination of alleles at multiple loci that are transmitted together on the same chromosome or chromosomal region. A haplotype can refer to as little as a pair of loci or a chromosomal region, or an entire chromosome. The term "alleles" refers to the alternative DNA sequence in the same physical genomic locus, which may or may not result in different phenotypic traits. In any particular diploid organism, with two copies of each chromosome (except the sex chromosomes in a human male), the genotype for each gene comprises the pair of alleles present in this locus, which are the same in homozygotes and different in heterozygotes . A population or species of organisms typically includes multiple alleles at each locus among several individuals. A genomic locus where more than one allele is found in the population is called a polymorphic site. The allelic variation in a locus is measurable as the number of alleles (that is, the degree of polymorphisms) present, or the proportion of heterozygotes (that is, the rate of heterozygosidase) in the population. As used herein, the term "polymorphisms" refers to any inter-individual variation in the human genome, regardless of its frequency. Examples of such variations include, but are not limited to, single nucleotide polymorphisms, simple tandem repeat polymorphisms, insertion-deletion polymorphisms, mutations (which can cause disease) and copy number variations. DETAILED DESCRIPTION
[0062] A construction of a partial genetic map or complete genomic sequence of an unborn fetus can be provided based on his parents' polymorphic sequence haplotypes. The term "haplotype" as used herein refers to a combination of alleles at multiple loci that are transmitted together on the same chromosome or chromosomal region. For example, embodiments can analyze DNA fragments from a maternal sample (containing maternal and fetal DNA) to identify alleles at certain specified loci (landmarks). The amounts of DNA fragments of the respective alleles at these loci can then be analyzed together to determine the relative amounts of haplotypes for these loci and thereby determine which haplotypes were inherited by the fetus from the maternal and / or paternal genomes. By identifying fetal haplotypes, the fetal genotype at an individual locus within the corresponding genomic region including the specified loci can be determined. In various embodiments, loci where the parents are a specific combination of homozygous and heterozygous can be analyzed in a way to determine regions of the fetal genome. In one implementation, reference haplotypes that are representative of common haplotypes in the population are used in conjunction with the analysis of DNA fragments from the maternal sample to determine the maternal and paternal genomes.
[0063] An example of an application of an embodiment to determine at least part of a fetal genome may be to test paternity by comparing the deduced fetal genotype or haplotype with the alleged father's genotype or haplotype. Another example is to detect one or more de novo mutations that the fetus has acquired, or to detect events of meiotic recombination that occurred during the production of gametes from its parents. These are the gametes that fertilized, and the resulting zygote that developed in the fetus.
[0064] In addition, some embodiments may also allow the genomic sequence of the unborn fetus to be determined at any desired resolution. For example, in certain applications, the forms of realization may allow the complete or almost complete genomic sequence of the fetus to be determined. In one embodiment, the resolution of the fetal genomic sequence that can be determined is dependent on the resolution of the knowledge of the father and mother genomes, in conjunction with the sequencing information of the maternal biological sample containing fetal nucleic acids. In the event that the complete or almost complete genomic sequence of the father and mother is known, the complete or almost complete genomic sequence of the unborn fetus can be deduced.
[0065] In other embodiments, only the genomic sequence of selected regions within the genome are elucidated, for example, for the prenatal diagnosis of genetic, epigenetic (such as printing disorders), or selected chromosomal disorders. Examples of genetic disorders to which an embodiment can be applied include hemoglobinopathies (such as beta-thalassemia, alpha-thalassemia, sickle cell anemia, hemoglobin E disease), cystic fibrosis, and sex-related disorders (such as hemophilia and Duchenne muscular dystrophy). Other examples of mutations that can be detected using one embodiment can be found in Online Mendelian Inheritance in Man (OMIM).
[0066] Some embodiments can also be used to determine a fractional concentration of fetal DNA, which can be done without any prior knowledge of the specific genomes of the parents. A similar analysis can also be used to determine the depth of coverage necessary for an accurate determination of the fetal genome. Thus, this scope determination can be used to estimate how much data needs to be analyzed to obtain accurate results. I. INTRODUCTION
[0067] When a maternal sample (for example plasma or serum) is used as the material to elucidate the fetal haplotype, there can be two main challenges. A first challenge is that the plasma or serum we kill consists of a mixture of fetal and maternal DNA, with fetal DNA being the smallest population, it was determined that fetal DNA represents an average / median concentration of some 5% to 10% of DNA total in maternal plasma in the first two trimesters of pregnancy (Lo YMD et al Am J Hum Genet 1998; 62: 768-775; Lun FMF et al Clin Chem 2008; 54: 1664-1672). Since DNA is released by maternal blood cells during the blood clotting process, the fractional concentration of fetal DNA in maternal serum may be even lower than that in maternal plasma. Thus, in some embodiments, maternal plasma is preferred over maternal serum.
[0068] A second challenge is that fetal DNA and maternal DNA in maternal plasma consist of short fragments (Chan KCA et al Clin Chem 2004; 50: 88-92). In fact, DNA derived from the fetus is generally shorter than DNA derived from the mother in maternal plasma. Most fetal DNA in maternal plasma is less than 200 base pairs in length. Using only such short plasma DNA fragments, it can be challenging to construct the haplotype of genetic polymorphisms over long genomic distances. The challenges mentioned above regarding plasma and serum matemos also apply to the detection of fetal DNA in maternal urine (Botezatu I et al Clin Chem 2000; 46: 1078-1084). Fetal DNA only represents a smaller fraction of the DNA in the urine of a pregnant woman, and the fetal DNA in the maternal urine also consists of short DNA fragments. A. Sequencing and Analysis of the Maternal Sample
[0069] A method that some embodiments have adopted to address the first challenge is to use a method that allows the quantitative genotyping of nucleic acids obtained from the maternal biological sample with high precision. In one embodiment of this method, precision is obtained by analyzing a large number (for example, millions or billions) of nucleic acid demolecules. In addition, accuracy can be enhanced by the analysis of single nucleic acid molecules or the clonal amplification of single nucleic acid molecules. One embodiment massively uses parallel DNA sequencing, such as, but not limited to, that performed by the Illumina Genome Analyzer platform (Bentley DR et al. Nature 2008; 456: 53-59), the Roche 454 platform (Margulies M et al. Nature 2005; 437: 376-380), the ABI SOLiD platform (McKeman KJ et al. Genome Res 2009; 19: 1527-1541), the Helicos single molecule sequencing platform (Harris TD et al. Science 2008; 320: 106-109), real-time sequencing using single polymerase molecules (Science 2009; 323: 133-138) and nanopore sequencing (Clarke J et al. Nat Nanotechnol. 2009; 4: 265-70). In one embodiment, massively parallel sequencing is performed on a random subset of nucleic acid molecules in the biological sample.
[0070] In some embodiments, it can be beneficial to obtain a read sequence of each molecule as long as possible. A limitation on the length of the sequencing readings that can be obtained is the nature of the nucleic acid molecules in the maternal biological sample. For example, it is known that most DNA molecules in maternal plasma consist of short fragments (Chan KCA et al Clin Chem 2004; 50: 88-92). In addition, the reading length must be balanced against the fidelity of the sequencing system in the long reading lengths. For some of the systems mentioned above, it would be preferable to obtain the sequence at both ends of the molecule, the so-called cut-end sequencing. As an illustration, one method is to perform 50 base pairs of sequencing on each end of a DNA molecule, thus resulting in a total of 100 base pairs of sequence per molecule. In another embodiment, 75 base pairs of sequencing from each end of a DNA molecule, thus resulting in a total of 150 base pairs of sequence per molecule, can be done.
[0071] After the sequencing is performed, the sequences are then aligned back to a reference human genome. As the embodiments elucidate the genomic variations inherited by an unborn fetus from its parents, the alignment algorithm may be able to handle sequence variations. An example of such a software package is the Efficient Large-Scale Alignment of Nucleotide Databases (ELAND) software produced by Illumina. Another example of such a software package is SOAP (short oligonucleotide alignment program) and SOAP2 software (Li R et al. Bioinformatics 2008; 24: 713-714; Li R et al. Bioinformatics 2009; 25: 1966-1967) .
[0072] The amount of DNA sequencing that may need to be performed may depend on the resolution at which the fetal genetic map or the fetal genomic sequence may need to be constructed, in general, the more molecules are sequenced the higher the resolution, another determinant The resolution of the fetal genetic map or fetal genomic sequence at a given level, or depth, of DNA sequencing is the fractional concentration of fetal DNA in the maternal biological sample. In general, the higher the fractional concentration of fetal DNA, the higher the resolution of the fetal genetic map or fetal genomic sequence that can be elucidated at a given level of DNA sequencing. Since the fractional concentration of fetal DNA in maternal plasma is higher than that in maternal serum, maternal plasma is a type of maternal biological sample more preferred than maternal serum for some embodiments.
[0073] The performance of the methods based on the sequencing mentioned above can be increased with the use of indexing or barcode. Thus, a patient-specific sample or index or bar code can be added to the nucleic acid fragments in a particular nucleic acid sequencing library. Then, several such libraries, each with a specific patient sample or index or bar code, are mixed together and sequenced together. Following sequencing reactions, sequencing data can be collected from each sample or patient based on the bar code or index. This strategy can increase the yield and thus the cost effectiveness of the embodiments of the current invention.
[0074] In one embodiment, the nucleic acid molecules in the biological sample can be selected or fractionated before quantitative genotyping (eg sequencing). In a variant, the nucleic acid molecules are treated with a device (for example a microarray) that can preferentially bind nucleic acid molecules from selected loci in the genome (for example the region on chromosome 7 containing the CFTR gene). Then the sequencing can be performed preferably on the nucleic acid molecules captured by the device. This scheme will allow sequencing to target the genomic region of interest. In one embodiment of this scheme a NimbleGen sequence capture system or an Agilent SureSelect Target Enrichment System, or similar platforms, may be used. In some embodiments, the nucleic acid molecules from the selected regions of the genome are individualized for random sequencing.
[0075] In another embodiment, the genomic region of interest in the biological sample can first be amplified by a set or multiple set of amplification primers. Then, quantitative genotyping, for example, sequencing, can be performed on the amplified products. In an implementation of this scheme, the RainDance system can be used. In some embodiments, the amplified nucleic acid molecules are individualized for random sequencing.
[0076] A size fractionation step can also be performed on the nucleic acid molecules in the biological sample. As fetal DNA is known to be shorter than maternal DNA in maternal plasma (Li et al Clin Chem 2004; 50: 1002-1011; US Patent Application 20050164241; US Patent Application 20070202525), the smallest molecular size fraction it can be harvested and then used for quantitative genotyping, for example, sequencing. Such a fraction may contain a higher fractional concentration of fetal DNA than in the original biological sample. Thus, sequencing a fraction enriched in fetal DNA can allow the fetal genetic map to be constructed or to deduce the fetal genomic sequence with a higher resolution at a particular level of analysis (for example, depth of sequencing), than if a sample not enriched was used. This can therefore make the technology more cost-effective. As examples of methods for size fractionation, one can use (i) gel electrophoresis followed by the extraction of nucleic acid molecules from specific gel fractions; (ii) the nucleic acid binding matrix with differential activity for nucleic acid molecules of different sizes; or (iii) filtration systems with differential retention for nucleic acid molecules of different sizes.
[0077] In another embodiment, one could preferably analyze nucleic acid molecules of a specific size or size range following the nucleic acid sequencing. For example, one could sequence the cut end where both ends of a DNA molecule are sequenced. Then, the genomic coordinates of both ends can be mapped back to a reference human genome. Then one could deduce the size of the molecule by subtracting the genomic coordinates from both ends. One way to perform such cut-off sequencing is to use the cut-off sequencing protocol of the Illumina Genome Analyzer. Another method for deducing the size of a DNA molecule is to sequence the entire DNA molecule. This is most easily done by sequencing platforms with relatively long reading lengths, such as the Roche 454 platform (Marguelis et alNature 2005; 437: 376-380) and Pacific Biosciences real-time single molecule technology (SMRT®) (Eid et al Science 2009; 323: 133-138). Following the deduction of the size of the nucleic acid molecules, one could choose the focus of subsequent analysis on molecules of less than a cut of particular size, thereby enriching the fractional concentration of fetal DNA. The analysis of this subset of molecules may allow the fetal genetic map or the fetal genomic sequence to be deduced with fewer molecules analyzed after the size selection than they would be if this procedure were not done. In one embodiment, a 300 base pair size cut is used. In yet other embodiments, a size cut of 250 base pairs, 200 base pairs, 180 base pairs, 150 base pairs, 125 base pairs, 100 base pairs, or 75 base pairs can be used . B. Use of Parental Genomes as Frames
[0078] To address the second challenge, some embodiments may use haplotypes of the mother's chromosomes as a 'frame'. The haplotypes of the parent's chromosomes can also be used as another 'frame'. This framework can be compared against the genetic information of the fetus obtained from the maternal sample containing fetal DNA. This fetal genetic information can be used to determine how the mother and / or father's frame was assembled into the fetal genome, thereby using the component parts of the frame to determine the resulting fetal genome.
[0079] Parental haplotypes can be constructed from the genomic DNA of the father and mother, and of other family members, for example a sibling (ã) of the fetus in the current pregnancy. It is possible that the availability of parental haplotypes may become increasingly common, in view of the reduction in the costs of genomic sequencing. In one scenario, if one or both parents already have their genomes sequenced and their haplotypes in one or more chromosomal regions have been determined, then this information can be used as the frame mentioned above.
[0080] Any genotyping platform known to those of skill in the art who can interrogate sequence variations in the genome can be used, including DNA sequencing, microarrays, hybridization probes, fluorescence-based techniques, optical techniques, molecular bar codes and single molecule imaging (Geiss GK et al. Nat Biotechnol 2008; 26: 317-325), single molecule analysis, PCR, digital PCR, mass spectrometry (such as the Sequenom MassARRAY platform), etc. As a more extreme example, the DNA sequence of the father and mother can be determined by sequencing the entire genome DNA using a massively parallel sequencing method (eg Bentley DR et al. Nature 2008; 456: 53-59; McKeman KJ et al. Genome Res 2009; 19: 1527-1541). An example of sequence variations that may be of interest are single nucleotide polymorphisms (SNPs). A particularly preferred method for determining parental genotypes is by microarray analysis of SNPs in a wide genome range, or in selected genomic regions, for example those containing genes whose mutations can cause genetic diseases (such as genes in the beta-globin group , or the cystic fibrosis transmembrane conductance regulating gene (CFTR)). Apart from sequence variations, variations in the number of copies can also be used. Sequence variations and copy number variations are both alluded to as polymorphic genetic characteristics (PMF).
[0081] In one aspect, the genotypes we kill on the chromosomes or chromosomal regions of interest can be built on haplotypes. One way in which this can be done is by analyzing other family members related to the mother, for example a son or daughter of the mother, a parent, a brother, etc. Another way in which haplotypes can be constructed is through other methods well known to those skilled in the technique mentioned above.
[0082] The genotype information can then be extended into the parents' haplotype information by comparing it with the genotype information of other family members, for example, a sibling of the current pregnancy fetus, or of the genotypes grandparents, etc. The parents' haplotypes can also be constructed by other methods well known to those skilled in the art. Examples of such methods include methods based on single molecule analysis such as digital PCR (Ding C and Cantor CR. Proc Natl Acad Sei USA 2003; 100: 7449-7453; Ruano G et al. Proc Natl Acad Sei USA 1990; 87: 6296-6300), sperm haplotyping (Lien S et al. Curr Protoc Hum Genet 2002; Chapter 1: Unit 1.6) and imaging technique (Xiao M et al. Hum Mutat 2007; 28: 913-921) . Other methods include those based on allele-specific PCR (Michalatos-Beloin S et al. Nucleic Acids Res 1996; 24: 4841-4843; Lo YMD et al. Nucleic Acids Res 1991; Nucleic Acids Res 19: 3561-3567), restriction enzyme cloning and digestion (Smirnova AS et al. Immunogenetics 2007; 59: 93-8), etc. Still other methods are based on the distribution and imbalance structure of binding of haplotype blocks in the population that allows the maternal haplotype to be deduced from statistical evaluations (Clark AG. Mol Biol Evol 1990; 7: 111-22; 10: 13 -9; Salem RM et al. Human Genomics 2005; 2: 39- 66). Use of Genomic Information from the Maternal Sample to Assemble the Frame
[0083] In one embodiment, to determine which of the chromosomes we killed was passed to the fetus, a relative haplotype (RHDO) dosing method is used. A general principle of this method is as follows for an example of where the mother is heterozygous for each of the genetic polymorphisms. Thus, there are two haplotypes, and the relative dosage of these haplotypes would be 1: 1. However, in the maternal sample, the presence of a small proportion of fetal DNA would alter the dosage of the relative haplotype. This is because the fetus would have inherited half of its haplotype complement from the mother and the other half from the father. In addition, for each chromosome, the fetus would have inherited a 'patchwork' of haplotypes that originated from one or the other of the homologous chromosomes of each parent, depending on the occurrence of meiotic recombination. All of these factors can deviate the relative haplotype dosage from the 1: 1 ratio in the maternal constitutional DNA. Thus, for a given chromosome or chromosomal region, the constituent alleles of these haplotypes can be searched for analytical data (for example, sequencing data) generated from the maternal sample.
[0084] Then, a statistical procedure can be performed to determine the relative haplotype dosage, or if one of these haplotypes is represented in excess in relation to the other haplotype. The classification threshold for this statistical procedure can be adjusted depending on the concentration of fractional fetal DNA. In general, a higher fractional concentration of fetal DNA can allow the threshold to be reached with fewer molecules. The classification threshold can also be adjusted depending on the number of successfully classified fragments that are desired to be obtained through the genome or genomic regions of interest. In one embodiment, the Sequential Probability Ratio (SPRT) test can be used.
[0085] In one embodiment, a relative mutation dosage (RMD), as described in US Patent Application 2009/0087847) can be used to determine a relative amount of an allele in the mother's particular polymorphisms. These relative amounts can be used in the determination of a fetus haplotype (for example when the polymorphisms are in consecutive or linked loci). In an implementation of this targeted method is the use of the polymerase chain reaction (PCR) to amplify the specific sequence from selected parts of the genome for the analysis of RMD. To extend this RMD method to determine fetal heredity in a large genomic region or the entire genome, a large volume of maternal sample is required.
[0086] In one embodiment using random sequencing, the genomic regions of interest are not specifically targeted. Thus, the sequence number obtained in the genomic regions of interest may not be as numerous as in a targeted method (unless very deep sequencing is performed). However, the counts can be gathered through various linked polymorphisms, to achieve the statistical power needed for diagnostic purposes. A practical implication of using this sequencing embodiment is that it can save costs by avoiding the need for excessively deep sequencing. It also requires the entry of a smaller amount of maternal sample than methods based on digital PCR.
[0087] In addition, it may be desirable to perform such an RHDO analysis in blocks. In other words, each chromosome can be analyzed in one, or preferably more than one block. In one aspect, the latter may allow meiotic recombination to be observed. For example, a haplotype of a segment of a particular chromosome of the fetus may appear to have originated from one of the homologous chromosomes mathematically, while another segment of the same fetal chromosome appears to have the haplotype of the other homologous chromosome. A SPRT analysis can allow this segmentation to be performed.
[0088] For example, SPRT analysis can be performed on neighboring SNPs demonstrating the required parental genotypic configuration (ie the father being homozygous and the mother being heterozygous) starting from one end of a chromosome. This will continue until the SPRT analysis has indicated that one of the killer haplotypes is prevalent in the maternal plasma analytical data (eg sequencing data). Then, the SPRT analysis can be 'zeroed' and started again from the next neighboring SNP demonstrating the required parental genotypic configuration. This can continue once more until the SPRT analysis has indicated yet again that one of the mathemates haplotypes is predominant in the analytical data of maternal plasma (eg sequencing data). This process can continue until the last SNP selected on said chromosome. Then, these various haplotype segments determined on the chromosome can be compared with the haplotypes of the two homologous chromosomes in the mother's genome. Meiotic recombination is observed when the haplotype segments in the fetus appear to have shifted from one homologous chromosome to another. This system can also work even if there is more than one chromosome meiotic recombination.
[0089] As described later, RHDO analysis can also be performed for genomic regions where the father and mother are both heterozygous in terms of the constituent genetic polymorphisms. This scenario is particularly useful for the situation when the father and mother share a mutant copy of the diseased gene from the same ancestral origin, such as when they are inbred, or when the predominant mutation for the disease is due to a large founding effect (ie most mutated individuals inherited the same haplotype from a common ancestral founder of the population). Thus, the father and mother haplotypes in this region can be used to deduce the fetal haplotype. II. CONSTRUCTION OF THE FETAL GENOME FROM THE MATERNAL GENOME
[0090] The construction of a fetal genetic map or the elucidation of the fetal genomic sequence with explicit knowledge of parental genomes are now described. A. Method
[0091] FIG. 1 is a flow chart of a method 100 of determining at least a portion of the unborn fetus genome of a pregnant woman. The fetus has a father and a mother who is the pregnant woman. The father has a paternal genome with two haplotypes and the mother has a maternal genome with two haplotypes. Method 100 analyzes nucleic acid molecules (fragments) from a biological sample obtained from the pregnant woman to determine the genome of the fetus. Method 100 is described primarily for the example of when the father is homozygous and the mother is heterozygous in a plurality of loci, while other examples describe other embodiments.
[0092] Method 100 and any of the methods described here can be totally or partially performed with a computer system including a processor, which can be configured to perform the steps. Thus, the embodiments are directed to the computer systems configured to carry out the steps of any of the methods described here, potentially with different components that carry out a respective step or a respective group of steps. Although presented as numbered steps, the method steps here can be performed at the same time or in a different order. Additionally, portions of these steps can be used with portions of other steps from other methods. Also, all or portions of a step can be optional. In addition, any of the steps of any of the methods can be performed with modules, circuits, or other means to perform these steps.
[0093] In step 110, a first plurality of loci are identified in which the maternal genome is heterozygous. In one embodiment, this determination can be made in part of a genotyping of the father and mother at the broad genome level or at selected genomic loci of interest. In other embodiments, the determination of the first plurality of loci can be done during an analysis of the maternal sample, which is described in later sections.
[0094] In step 120, each of the two haplotypes matemos that cover the first plurality of loci is determined. As mentioned above, the maternal genome can be obtained from direct sequencing. In other embodiments, genotyping can be done at a plurality of loci and then the use of a mapped genome from someone who is expected to have a similar genome, for example a family member or a reference genome that is common in the same or similar population. In one embodiment, step 120 can be performed first for all or parts of the maternal genome and then the maternal genome can be investigated to find the loci where the mother is heterozygous.
[0095] In one respect, it is not essential to construct the haplotypes of the father's chromosomes. However, if the paternal haplotypes can be constructed then additional information can be obtained from the sequencing results. Such additional information includes the fact that analysis of relative haplotype dosage can be performed for regions for which both parents are heterozygous. Another additional piece of information that can be obtained if the paternal haplotype is available is information regarding the meiotic recombination that involves one or more paternal chromosomes, and to determine whether disease alleles linked to such polymorphisms have been passed on to the fetus.
[0096] In step 130, an allele inherited from the father by the fetus in each of the first plurality of loci is determined. Some embodiments use genomic loci that are homozygous for the father, but heterozygous for the mother (as mentioned in step 110). Thus, if the father is homozygous at the loci, then the allele that is inherited from the father is known. The genotyping of the father to determine loci in which the father is homozygous can be determined in any of the ways described herein. In one embodiment, the determination of the first plurality of loci can be determined based on the genotyping of the father and mother in order to find loci in which the father is homozygous and in which the mother is heterozygous.
[0097] In another embodiment, a second plurality of paternal genome loci that are heterozygous can be used to determine the paternal haplotype inherited by the fetus in the first plurality of loci in which the father is homozygous. For example, if the maternal genome is homozygous in the second plurality of loci, alleles that are present in the paternal genome in those respective to the second plurality of loci and absent in the maternal genome can be identified. The inherited paternal haplotype can then be identified as the haplotype with the identified alleles, and used to determine the inherited allele from the father in the first plurality of loci. These aspects of determining a paternal haplotype are discussed in more detail below.
[0098] In step 140, a plurality of nucleic acid molecules from a biological sample obtained from the pregnant woman is analyzed. The sample contains a mixture of mathematical and fetal nucleic acids. The maternal biological sample can be collected and then received for analysis. In one embodiment, plasma and maternal serum are used. In other embodiments, maternal blood, maternal urine, maternal saliva, uterine lavage fluid, or fetal cells obtained from maternal blood can be used.
[0099] In one embodiment, analyzing a nucleic acid molecule includes identifying a locus of the nucleic acid molecule in the human genome, and determining an allele of the nucleic acid molecule in the individual locus. Thus, an embodiment can perform quantitative genotyping using the determined alleles of the nucleic acid molecules of the same locus. Any method that will enable the determination of the genomic location and allele (information as for genotype) of nucleic acid molecules in the maternal biological sample can be used. Some of such methods are described in U.S. orders 12 / 178,181 and 12/614350, and order entitled "Size-Based Genomic Analysis."
[00100] In step 150, based on the determined alleles of the nucleic acid molecules, amounts of respective alleles in each of the first plurality of loci are determined. In one embodiment, the amounts can be the number of alleles of each type in a first locus. For example, six A and four T. In another embodiment, an amount can be a size distribution of nucleic acid molecules having a particular allele. For example, a relative quantity can also include a size distribution of fragments with a particular genotype, which can carry a relative quantity of fragments in certain lengths. Such relative quantities can also provide information about which genotype is in the fetal genome, since fetal fragments tend to be smaller than mathematical fragments. Some examples of quantities and methods are described in U.S. orders 12 / 178,181 and 12/614350, and order entitled "Size-Based Genomic Analysis."
[00101] In one embodiment, the relative amounts of alleles at a locus can provide information about which genotype was inherited by the fetus (for example after a data set has reached sufficient statistical strength). For example, relative quantities can be used to determine whether a sequence imbalance occurs in relation to the mother's genotypes at a locus. The related patent applications cited above provide examples of embodiments for detecting a sequence imbalance in a particular locus or region.
[00102] In step 160, relative amounts of the respective alleles of the nucleic acid molecules in more than one locus of the first plurality of loci are compared. In some embodiments, quantities of each allele at each locus of the first plurality of loci comprising the haplotypes are aggregated before making a comparison. The aggregate quantities of the parental haplotypes can then be compared to determine whether a haplotype is overrepresented, represented in an identical way or underrepresented. In other embodiments, the quantities for the alleles at a locus are compared, and comparisons at multiple loci are used. For example, a separation value (for example a difference or a ratio) can be added, which can be used in comparison with a cut-off value. Each of these embodiments can be applied to any of the comparison steps described here.
[00103] In various embodiments, the relative quantities can be a count of several of each fragment with a particular allele at a particular locus, a count of several fragments from any locus (or any loci in a region) in a particular haplotype , and a statistical value of the count (for example, an average) in a particular locus or in a particular haplotype. Thus, in one embodiment, the comparison can be a determination of a separation value (for example a difference or a ratio) of an allele vs. another allele at each of the loci.
[00104] In step 170, based on the comparison, the haplotype that is inherited by the unborn fetus from the mother in the portion of the genome covered by the first plurality of loci can be determined. In one embodiment, to determine which of the chromosomes we have passed on to the fetus, a relative haplotype (RHDO) dosing method is used, for example, as mentioned above. As the mother is heterozygous for each of the primary loci, the primary loci correspond to two haplotypes for the genomic region of the primary loci. The relative dosage of these haplotypes would be 1: 1 if the sample was only from the mother. Deviations or lack of deviations from this reason can be used to determine the haplotype of the fetus that is inherited from the mother (and from the father, which is treated in more detail later). Thus, for a given chromosome or chromosomal region, the constituent alleles of these haplotypes can be searched for from the analytical data (for example, sequencing data) generated in step 130.
[00105] Since a plurality of loci are analyzed and compared with the mother's haplotype, the sequence between the loci can be attributed to a particular haplotype. In one embodiment, if several loci are compatible with a particular haplotype, then the sequence segments between the loci can be assumed to be the same as those of the maternal haplotype. Because of the occurrence of meiotic recombination, the final haplotype inherited by the fetus may consist of a patchwork of ‘haplotype segments’ that originate from one of these two homologous chromosomes. The embodiments can detect such a recombination.
[00106] The resolution in which such recombination could be detected is dependent on the number and distribution of the genetic markers that were determined in the constitutional DNA of the father or mother, and the threshold that is used in the subsequent bioinformatics analysis (using for SPRT). For example, if the comparison suggests that the allele inherited from the mother in each of a first set of consecutive loci corresponds to the first haplotype, then the first haplotype is determined to be inherited to the genomic locus that corresponds to the first set of loci. If a second set of consecutive loci suggests that the second haplotype is inherited, then the second haplotype is determined to be inherited for the genetic location that corresponds to the second set of loci.
[00107] In one embodiment, since a plurality of loci are analyzed, the haplotype can be determined with greater precision. For example, the statistical data for one of the loci may not be determinative, but when combined with the statistical data for other loci, a determination of which haplotype is inherited can be made. In another embodiment, each locus can be analyzed independently to make a classification, and then the classifications can be analyzed to provide a determination of which haplotype is inherited for a given region.
[00108] In one embodiment, a statistical procedure can be performed to determine the relative haplotype dosage (for example if one of these haplotypes is represented in excess in relation to the other haplotype). The classification threshold for this statistical procedure can be adjusted depending on the fractional fetal DNA concentration. In general, a higher fractional fetal DNA concentration can allow the threshold to be reached with fewer molecules. The classification threshold can also be adjusted depending on the number of successfully classified segments that you want to obtain through the genome or genomic regions of interest.
[00109] Referring back to FIG. 1, in step 180, the fetal genome can be analyzed for mutations. For example, the embodiments can be used to search for a panel of mutations that cause genetic diseases in a particular population. Examples of mutations that can be detected using the embodiments can be found in Online Mendelian Inheritance in Man (OMNI). These mutations can be searched for during steps 140-160; or as a separate step as described herein. For example, in families where the father is a carrier of one or more mutations that are absent in the mother, then the mutation (s) can be searched for from the analytical data (for example data sequencing) of the maternal biological sample.
[00110] Apart from detecting the real mutation, one could also observe the polymorphic genetic markers that are linked to the mutant or wild type allele in the father or mother. For example, RHDO analysis may reveal that the fetus inherited the mother's haplotype which is known to carry a mutation for a disease. Embodiments of the invention can also be used for non-invasive prenatal diagnosis of diseases caused by deletions from chromosomal regions, for example the Southeast Asian deletion that causes alpha-thalassemia. In the scenario where both the father and the mother are carriers of the deletion, if the fetus is homozygous for the deletion, and if massively parallel sequencing is performed on the DNA of the maternal plasma, then there must be a reduction in the DNA sequence frequencies that originate from the deleted region in maternal plasma. B. Example
[00111] This section describes an example of embodiments (for example method 100) applied to single nucleotide polymorphisms (SNPs) in which the mother is heterozygous. The SNP alleles on the same chromosome form a haplotype, with the mother having a homologous pair for each chromosome, and thus two haplotypes. To illustrate how such a determination is performed, consider a segment on chromosome 3, for example, as shown in FIG. two.
[00112] FIG. 2 shows two haplotypes for the father and two haplotypes for the mother for a particular segment of their respective genomic code. Five SNPs were discovered within this segment in which the father and mother were homozygous and heterozygous, respectively, for all 5 of these SNPs. The two homologous chromosomes of the father had the same haplotype (Hap), that is, A-G-A-A-G (from top to bottom in FIG. 2). For simplicity, the paternal haplotypes are called Hap I and Hap II, bearing in mind that both of these are identical for this set of 5 SNPs. For the mother, two haplotypes were observed, namely Hap III, A-A-A-G-G and Hap IV, G-G-G-A-A.
[00113] The SNPs in this example can be further classified into two types. FIG. 3 shows the two types of SNPs according to the embodiments of the present invention. Type A consists of those SNPs in which the paternal alleles were the same as those in maternal haplotype III. Type B consists of those SNPs in which the paternal alleles were the same as those in maternal haplotype IV.
[00114] These two types of SNPs may require slightly different mathematical handling. Thus, in the Type A scenario, the fetal inheritance of haplotype III would result in excessive representation of haplotype III, compared to haplotype IV, in maternal plasma (FIG. 4A). For example, considering only one SNP 410 for ease of debate, the A allele is inherited from the father, and if Hap III is inherited from the mother, then the fetus will be contributing two A alleles to the sample, which will cause an excessive representation of A If the fetus inherited haplotype IV then no excessive representation would be observed, since the fetus would also be heterozygous with A and G at the locus.
[00115] On the other hand, in the Type B scenario, the fetal inheritance of haplotype III would result in the equal representation of haplotype III and haplotype IV in maternal plasma (FIG. 4B). For example, considering against SNP 420, the inheritance of G from the father and A as part of Hap III would cause the fetus to contribute equal amounts of A and G in SNP 420, just like the mother. If the fetus inherited haplotype IV, then overrepresentation would be observed as is evident from the above debate.
[00116] FIGS. 5A and 5B show the analysis of comparing relative quantities (e.g. counts) of fragments for each locus and whether a result of the comparison is to classify a particular haplotype as being inherited or not. Any genetic location in which there is a SNP that adapts to one of these genotypic configurations of the father and mother (for example Type A or Type B scenarios) can be used for this example. From the maternal plasma sequencing data, one can focus on the number of sequenced molecules that correspond to a particular SNP allele. A SPRT analysis (or other comparison method) can be used to determine if there was any allelic imbalance between these alleles (Lo YMD et alProc Natl Acad Sei USA 2007; 104: 13116-13121).
[00117] FIG. 5A shows an analysis for type A SNPs. As shown, for each SNP, a SPRT comparison of relative quantities (for example as defined by a separation value) to a cutoff value provides a rating. In one embodiment, if the classification threshold for SPRT has been reached then the fetal inheritance of a particular maternal haplotype has been completed. The count for the SPRT analysis can then be reset to zero. Then, an analysis can move over a neighboring SNP that fits the required genotypic configuration, from the telomeric to the centromeric direction, or vice versa; and the new SPRT analysis can start with this next SNP.
[00118] On the other hand, in one embodiment, if the classification for SPRT has not been achieved with the SNP, then we can also move over a neighboring SNP in a similar way, except that counts for the next SNP can be added the previous one and then SPRT can be performed again. This process can continue until the rating threshold has been reached. FIG. 5A and FIG. 5B illustrate the operation of this process for Type A and Type B analyzes. In one embodiment, the classifications are analyzed together to compose a total classification for a region. For example, if a classification is obtained for a first group of SNPs and for the next group of SNPs, the classification of the two can be compared to see if the classification is compatible.
[00119] FIG. 6 illustrates the effect of changing the probability ratio for the SPRT classification (Zhou W et al. Nat Biotechnol 2001; 19: 78-81; Karoui NE et al.Statist Med 2006; 25: 3124-33). In general, a lower probability ratio for the classification, for example, 8, may allow the classification to be made more easily. This can result in a large number of classified regions within the genome. However, several of such regions can be expected to be poorly classified. On the other hand, a higher probability for classification, for example, 1200, can allow classification only when more SNPs have been counted. This can result in fewer regions classified within the genome. The number and proportion of poorly classified regions can be expected to be lower when compared to situations when a lower classification threshold was used.
[00120] In one embodiment, a classification is made only if two consecutive SPRT classifications result in the same haplotype (referred to as the “two consecutive blocks” algorithms). In one aspect, the “two consecutive blocks” algorithm can increase the classification accuracy. In some embodiments, for any stretch of sequence, an embodiment may first perform a SPRT analysis for Type A SNPs, and then another SPRT analysis for Type B SNPs. In one embodiment, the scenario for a sequence stretch for which Type A and Type B SNPs form two groups that intertwine genetic reference points (for example SNPs). In embodiments using the “two consecutive blocks” algorithm, the two blocks can be of different types.
[00121] The SPRT that results from the analysis of Type A and Type B can allow you to check for agreement or disagreement in your classification results. To enhance the classification accuracy, an embodiment (“interlacing method”) can only make a classification if analyzes of both Type A and Type B for a given genomic region can produce compatible results. If the two analyzes produce discordant results, we can see in the classification results of the two contiguous classification regions close to the region, one at the centromeric end and the other at the telomeric end. If these two contiguous regions produce concordant results, then we can classify the first region as a continuous haplotype with these two regions. If these two contiguous regions do not produce concordant results, then we can move over the next two contiguous regions until agreement is observed. A variant of this theme is to move only in one direction and take the classification results of one, or two, or even more contiguous regions as the results of the original region of interest. The general principle is to use the classification results of adjacent genomic regions to confirm the classification results of a particular region. III. DETERMINATION OF PATERNAL ALLELES INHERITED BY THE FETUS
[00122] FIG. 7 is a flow chart of a method 700 of determining at least a portion of the unborn fetus genome of a pregnant woman inherited from the father. Method 700 analyzes nucleic acid molecules (fragments) from a biological sample obtained from the pregnant woman to determine the genome of the fetus. The sample contains a mixture of mathematical and fetal nucleic acids.
[00123] In step 710, each of a plurality of nucleic acid molecules in the biological sample is analyzed to identify a locus of the nucleic acid molecule in the human genome, and to determine a type of allele of the nucleic acid molecule. Thus, the genotypes of the nucleic acid molecules at a particular location (locus) can be determined in one embodiment. Any of the methods described above and elsewhere can be used for this analysis.
[00124] In step 720, a first plurality of loci is determined in which the paternal genome is heterozygous and the maternal genome is homozygous. In one embodiment, the first plurality of loci is obtained by determining the paternal and maternal genomes. Genomes can be explored for genomic loci where the father is heterozygous and the mother is homozygous.
[00125] In step 730, the haplotype that is inherited by the unborn fetus from the father in the portion of the genome covered by the first plurality of loci is determined based on the genotypes determined in the first plurality of loci. In one embodiment, the allele of each of these loci that is owned by the father, but absent in the mother's genome, is searched for in the analytical data (for example, sequencing data). The combination of these alleles would indicate the haplotypes of chromosomes that the fetus inherited from the father.
[00126] In another embodiment, if the haplotypes of each of the chromosomes or chromosomal regions of interest in the father's genome are known, then it can also be determined where the meiotic recombination occurred during spermatogenesis in the father. Consequently, paternal meiotic recombination is observed when the haplotype of a stretch of DNA on a patently inherited chromosome differs between the fetus and the father. The inclusion of such recombination information can be useful when analytical data (for example, sequencing data) is used for the prenatal diagnosis of a genetic disease by analyzing linkage to genetic polymorphisms.
[00127] IV. FATHER AND MOTHER ARE HETEROZYGOTHES FOR A GENOMIC REGION
[00128] The embodiments can address a scenario in which the father and mother are heterozygous for a genomic region. This scenario can be particularly relevant in families where the father and the mother are consanguineous. When a disease is associated with a predominant mutation that resulted from a major founding effect, it may also be relevant. In such circumstances, it should be expected that if the father and mother of the unborn fetus are both carriers of the mutant gene, then the haplotype of the chromosome that carries the mutant copy of the gene may be essentially identical, except for the occurrence of recombination events. meiotic. This type of analysis can be especially useful for autosomal recessive diseases such as cystic fibrosis, beta-thalassemia, sickle cell anemia, and hemoglobin E disease.
[00129] FIG. 8 is a flow chart of a method 800 for determining at least a portion of the genome of an unborn fetus in a region where the mother and father are heterozygous according to the embodiments of the present invention.
[00130] In step 810, a first plurality of loci are determined in which the father and the mother are both heterozygous. In one embodiment, the primary loci can be determined by any of the methods mentioned here. For example, all or regions of the parent genomes can be sequenced, or different parts genotyped to find primary loci. Thus, each of the two paternal haplotypes and each of the two killer haplotypes in the first plurality of loci can be known.
[00131] As an example, FIG. 9 shows haplotypes of a father and mother who are both heterozygous in a particular genomic region. As shown, both parents have a mutant (allele) gene in region 1. Specifically, the father's Hap I and the mother's Hap III have the mutant gene. Also as shown, the father and mother can each have the other copy of the chromosome that carries the wild type copy of the gene. Specifically, the father's Hap II and the mother's Hap IV have the wild-type gene. Thus, this example has relevance in determining whether a fetus has inherited a mutant gene. The chromosomes of the father and mother that carry the wild-type gene have an identical haplotype in the immediate vicinity of the gene, but they could have divergent haplotypes more distant from the gene. Since this chromosome would probably have a different ancestral origin, this chromosome is unlikely to have identical haplotypes between the father and mother throughout the chromosome.
[00132] In step 820, a second plurality of loci are determined in which the father is heterozygous, but in which the mother is homozygous. As shown, the first and second pluralities of loci are on the same chromosome. Region 2 shows such secondary loci. Region 2 can be chosen such that the father is heterozygous for one or more SNPs in this region while the mother is homozygous in this region.
[00133] In step 830, fragments of a sample from the pregnant woman can be analyzed to identify a locus in the human genome and a genotype. The location can be used to determine whether a fragment (nucleic acid molecule) includes one or more of the primary loci or one or more of the secondary loci. This information can then be used to determine the haplotype inherited from the father and the haplotype inherited from the mother.
[00134] In step 840, which of the two paternal haplotypes was inherited by the fetus is determined by analyzing the determined genotypes of the plurality of nucleic acid molecules in the biological sample in at least one of the secondary loci. For example, SNP alleles that are specifically present in the father's genome, but absent in the mother's genome, such as the T allele marked by * and the A allele marker by + in FIG. 9, can be searched from the analytical data (for example location and genotype that result from step 710) of the biological sample. As can be done for method 700, if the T allele marked by * is detected from maternal plasma, then it means that haplotype II (Hap II) is inherited from the father by the fetus. Conversely, if the + allele marked by + is detected in maternal plasma, then it means that Hap I is inherited from the father by the fetus.
[00135] In step 850, compare relative amounts of the determined genotypes of nucleic acid molecules in more than one of the first plurality of loci. In one embodiment, quantities at each locus are aggregated and the relative quantities of the mathematical haplotypes are compared. The relative quantities can refer to the counted numbers, size distributions, and any other parameter that can carry information about which genotype is in the fetal genome at a particular locus.
[00136] In step 860, based on the determined paternal haplotype being inherited by the fetus and based on the comparison of relative quantities, determine the haplotype that is inherited by the unborn fetus from the mother in the portion of the genome covered by the first plurality of loci. Thus, an analysis (for example as described above) of SNPs in Region 1 of the analytical data of the maternal biological sample can be performed to determine that one of the two mathematical haplotypes was inherited by the fetus, taking the paternal haplotype inherited by the fetus in Region 2 in consideration. In one embodiment, it is assumed that there is no recombination between Regions 1 and 2 when these regions are passed from the parents to the fetus.
[00137] For example, considering the scenario when the fetus was determined to have inherited Hap I from the father by analyzing Region 2. Then, the fetal inheritance of Hap III (which is identical to Hap I in Region 1) from the mother will result in excessive representation of Hap III in relation to Hap IV in maternal plasma. Conversely, if the fetus inherited Hap IV from the mother, then equal representation of Hap III and Hap IV will be observed in the maternal plasma.
[00138] As another example, consider the scenario when the fetus was determined to have inherited Hap II from the father by analyzing Region 2. Then, the fetal inheritance of Hap IV (which is identical to Hap II in Region 1) from the mother will result in the overrepresentation of Hap IV in relation to Hap III in maternal plasma. Conversely, if the fetus inherited Hap III from the mother, then equal representation of Hap III and Hap IV will be observed in the maternal plasma.
[00139] In the previous sections, we deduced that the fetal genome and fractional concentration of fetal DNA use the data obtained from the sequencing of the DNA of the maternal plasma, as well as the genotypic information of the parents of the fetus. In the following sections, we describe embodiments to deduce the fractional concentration of fetal DNA and fetal genotype without prior information on the parent and paternal genotypes / haplotypes. V. DETERMINATION OF THE CONCENTRATION OF FETAL FRACTIONAL DNA
[00140] In some embodiments, an optional step is to determine a fractional fetal DNA concentration. In many respects, this fractional concentration can guide the amount of analysis (for example the amount of sequencing required) or allow you to estimate the accuracy of the analysis for a given amount of data (for example, depth of coverage of the genome sequence). Determining the fractional concentration of fetal DNA can also be useful in determining a cut to determine a classification of which haplotype and / or genotype are inherited.
[00141] In one embodiment, the fractional concentration of fetal DNA can be determined by exploiting the analytical data (for example as can be obtained in steps 140 and 710) for the loci that are homozygous for the father and mother, but with different alleles. For example, for a SNP with two alleles, A and G; the father may be AA and the mother may be GG, and vice versa. For such loci, the fetus would necessarily be a heterozygote. In the example above, the fetal genotype would be AG, and a proportion of the allele A in the maternal sample can be used to determine the fractional concentration of fetal DNA. In another embodiment, a statistical analysis can be done to determine a locus where the mother is homozygous and the fetus is heterozygous. thus, no prior information about the mother's genome or the paternal genome is necessary.
[00142] As alternatives to explore the analytical data, the fractional concentration of fetal DNA can also be determined by another method, such as the use of PCR assays, digital PCR assays or assays based on mass spectrometry, in a panel of pohmorphic genetic markers (Lun FMF et al Clin Chem 2008; 54: 1664-1672). Another alternative is to use one or more genomic loci that exhibit different DNA methylation between the fetus and the mother (Poon LLM et al. Clin Chem 2002; 48: 35-41; Chan KCA et al. Clin Chem 2006; 52: 2211 -2218; US Patent 6,927,028). Yet another alternative is to use an approximate concentration of fractional fetal DNA determined from a reference population, for example at a similar gestational age. However, as the fractional concentration of fetal DNA can vary from sample to sample, the latter method can be expected to be less accurate than if the concentration is measured specifically for the sample being tested. Determination of Fractional Concentration for Mandatory Heterozygote
[00143] In embodiments where the fetus is a mandatory heterozygote, the fractional concentration of fetal DNA can be determined using the following series of calculations (for example using massively parallel sequencing). Leaving p to be the fetal allele count that are absent from the maternal genome. Letting q be the count of the other allele, this is the allele that is shared by the maternal and fetal genomes. The fractional concentration of fetal DNA is given by the following equation: 2 /

[00144] In one implementation, this calculation can be performed on cumulative data through different polymorphic genetic loci or polymorphic genetic characteristics that satisfy the parental genotype configuration (for example. Both parents being homozygous, but for different alleles) .B. Determination based on Informational SNPs
[00145] The fractional concentration of fetal DNA can also be determined for any locus in which the mother is homozygous and the fetus is heterozygous, and not just when the mother is homozygous for one allele and the father is homozygous for a different allele. Both methods providing whether a locus is informative. The term "informational SNP" can be used in different contexts depending on what information is desired. In a context, information is an allele in the fetal genome at a particular locus that is not present in the maternal genome at that locus. Thus, the subset of SNPs that the mother is homozygous for and the fetus is heterozygous can be referred to as "informational SNPs" for the context of determining the fetal DNA concentration. Cases where the mother and the fetus are both heterozygous, but at least for a different allele, can also be used as an informational SNP. However, trialelic SNPs are relatively uncommon in the genome.
[00146] FIG. 10 is a flow chart illustrating a method 1000 for determining the fractional concentration of fetal material in a maternal sample according to the embodiments of the present invention. In step 1010, fragments of a sample from the pregnant woman can be analyzed to identify a locus in the human genome and a type of allele (which can lead to genotype determination at the locus). In one embodiment, the fragments are analyzed by sequencing a plurality of nucleic acid molecules from the biological sample obtained from the pregnant woman. In other embodiments, real-time PCR or digital PCR can be used.
[00147] In step 1020, one or more primary loci are determined to be informative. In some embodiments, the maternal genome is homozygous, but a non-maternal allele is detected in the sample at an informative locus. In one embodiment, the fetal genome is heterozygous at each of the primary loci and the maternal genome is homozygous at each of the primary loci. For example, the fetal genome can have a respective primary and secondary allele (for example TA) in a first locus, and the maternal genome can have two of its respective secondary alleles (for example AA) in the first locus. However, such loci may not be a priori known, for example, in situations where the fetus is not a mandatory heterozygote.
[00148] In an embodiment to determine an informative locus, the SNPs in which the mother is homozygous are considered. For SNPs the mother is homozygous, the fetus is homozygous for the same allele or is heterozygous. For example, if a SNP is polymorphic for A and T, and the mother has an AA genotype, the fetus's genotype is AA or TA. In this case, the presence of the T allele in the maternal plasma sample would indicate that the fetal genotype is TA instead of AA. Certain embodiments may address how much of a T allele presence indicates a TA genotype by calculating a necessary cut, as described below.
[00149] In step 1030, for at least one of the primary loci, a first p number of counts for the respective first allele and a second q number of counts for the respective secondary alleles are determined. In one embodiment, the counts of the specific alleles of the fetus (the T allele) and the shared (the allele A) in maternal plasma can be determined by a variety of methods, for example, but not limited to real-time PCR, Digital PCR, and massively parallel sequencing.
[00150] In step 1040, the fractional concentration is calculated based on the first and second numbers. In one embodiment, in a pregnant woman with AA genotype and her fetus genotype being TA, the fractional concentration of fetal DNA (f) can be calculated using the equation: f = 2 xp / (p + q), where p represents the counts for the specific fetus allele (T allele) and q represents the counts for the allele shared by the mother and the fetus (allele A).
[00151] In another embodiment, by using multiple informational SNPs, the fractional concentration of fetal DNA in maternal plasma can be estimated with increased accuracy. For the use of multiple SNP allele counts (a total of n SNPs), the fractional concentration of fetal DNA (f) can be calculated using the equation
where pi represents the counts for the fetus specific allele for informative SNPi; q, represents the counts for the allele shared by the mother and the fetus for the informative SNPi; en represents the total number of informational SNPs. The use of multiple SNP allele counts can increase the accuracy of estimating fractional concentration of fetal DNA. C. Fractional Concentration Without Explicit Genetic Information From Parents
[00152] A method for determining the fractional concentration of fetal DNA in a sample of maternal plasma that does not require prior information with respect to the genotypes of the fetus and the mother is now described. In one embodiment, the identification of informational SNPs is made from the counts of different alleles at these SNP loci in maternal plasma. Thus, method 1000 can be used, together with the determination of informational SNPs based on the embodiments described below. First, a description of probabilities is provided to help understand a cut calculation that is used to identify informational SNPs.
[00153] In one embodiment, the probability of detecting the specific fetus allele follows the Poisson distribution. The probability (P) of detecting the specific fetus allele can be calculated using the following equation: P = 1 - exp (-fx N / 2), where f represents the fractional concentration of fetal DNA in the maternal plasma sample, N represents the total number of molecules that correspond to this particular SNP locus that is analyzed; and exp () represents the exponential function. In one aspect, P can be considered an expected distribution since it is not a distribution that results from measuring a number of molecules across many samples. In other embodiments, other distributions can be used.
[00154] Assuming that the fractional concentration of fetal DNA is 5% (a typical value for the first trimester of pregnancy) and 100 molecules (maternal + fetal) that correspond to this SNP locus are analyzed (equivalent to the amount contained in 50 diploid genomes), the probability of detecting the specific fetus allele (the T allele) is 1 - exp (-0.05 x 100/2) = 0.92. The probability of detecting the specific fetus allele may increase with the fractional concentration of fetal DNA and the number of molecules that are analyzed for the SNP locus. For example, if the concentration of fetal DNA is 10% and 100 molecules are analyzed, the probability of detecting the specific fetus allele is 0.99.
[00155] Therefore, in a locus of SNP to which the mother is homozygous, the presence of an allele other than the maternal in the maternal plasma may indicate that the SNP is "informative" for the calculation of the fractional concentration of fetal DNA. The probability of losing any informational SNP can be dependent on the number of molecules analyzed. In other words, for any desired confidence in detecting informational SNPs, the number of molecules that need to be analyzed to obtain a desired precision can be calculated according to the Poisson probability function.
[00156] Using the above analysis, some embodiments can determine whether a locus is informative or not when the mother's genotype is not known. In one embodiment, loci in which two different alleles are detected in the maternal plasma sample are identified. For example, for a SNP locus with two possible alleles A and T, both the A and T alleles are detected in maternal plasma.
[00157] FIG. 11 is a flow chart of a 1100 method for determining whether a locus is informative according to the embodiments of the present invention. In one embodiment, method 1100 can be used to implement step 1020 of method 1000. In another embodiment, a step of method 1100 is to determine a cutoff value based on a statistical distribution, and another uses the cutoff value to determine whether a locus (SNP) is informational.
[00158] In step 1110, a cutoff value is determined for several predicted counts of the respective first allele at the specific locus. In one implementation, the cutoff predicts whether the maternal genome is homozygous and the fetal genome is heterozygous. In one embodiment, the cutoff value is determined based on a statistical distribution of count numbers for different combinations of homozygosidase and heterozygosidase at the specific locus. For example, an allelic frequency distribution can be predicted using the Poisson distribution function.
[00159] In step 1120, based on an analysis of the nucleic acid molecules of the maternal sample (for example from step 1010), a first allele and a secondary allele are detected at the locus. For example, a set of fragments can be mapped to the locus that is analyzed and the first or secondary alleles have been detected. The first allele can correspond to one of the respective primary alleles of step 1020, and the secondary alleles can correspond to one of the respective secondary alleles. In one embodiment, if two different alleles are not detected, then it is known that the locus is not informative.
[00160] In step 1130, several actual counts of the respective first allele at the locus are determined based on the analysis of the nucleic acid molecules. For example, the results of sequencing the plurality of nucleic acid molecules can be counted to determine the number of times that a fragment having a genotype of the first allele is mapped to the locus.
[00161] In step 1140, the locus is identified as one of the primary loci based on a comparison of the number of actual counts to the cutoff value. In one aspect, a cut-off value can be used to differentiate between three possibilities: (a) the mother is homozygous (AA) and the fetus is heterozygous (AT); (b) the mother is heterozygous (AT) and the fetus is heterozygous (AT); and (c) the mother is heterozygous (AT) and the fetus is homozygous for (AA) or (TT). For the sake of illustration, the examples below assume the fetal genotype as being AA in scenario (c). However, the calculation would be the same if the fetus genotype was TT. An informational locus would have the possibility (a).
[00162] In one embodiment, the locus is identified as one of the primary loci when the number of actual counts is less than the cutoff value. In another embodiment, a lower threshold can also be used to ensure that spurious mapping does not occur.
[00163] The embodiment for determining the cut is now described. Based on the physiologically possible fractional concentration of fetal DNA (this information is available from previous studies) and the total number of molecules that correspond to the SNP locus, the distribution of allelic counts can be predicted for the three possible scenarios above. Based on the predictable distribution, a cut-off value can be determined to interpret the allelic counts observed in maternal plasma to determine whether an SNP is "informative" (ie scenario (a)) or not.
[00164] The fractional concentration of typically fetal DNA ranges from 5% to 20% in early pregnancy and ranges from 10% to 35% in late pregnancy (Lun et al., Microfluidics digital PCR reveals a higher than expected fraction of fetal DNA in maternal plasma (Clin Chem 2008; 54: 1664-72). Thus, in one embodiment, the predicted distributions of the allelic counts for 5% and 20% fractional concentration of fetal DNA were determined.
[00165] FIG. 12A shows the predictable distribution of counts for the T allele (the least abundant allele in scenarios (a) and (c)) for the three scenarios with an assumed fractional concentration of fetal DNA of 20%. FIG. 12B shows the predictable distribution of counts for the T allele (the least abundant allele for scenarios (a) and (c)) for the three scenarios with the hypothesis of 5% fetal DNA. In both predicted models, a total of 200 molecules were assumed to be analyzed for locus SNP.
[00166] Using the presence of 40 counts of the least abundant allele (the T allele) as a cut, the three possibilities can be statistically discriminated. In other words, for any SNP locus with two alleles detected in maternal plasma and with a total of 200 molecules being analyzed, if the allele frequency of the minor allele (the least abundant allele) is less than 40, the SNP locus can be considered “informative”. For fractional concentrations of fetal DNA of 5% and 20%, the differentiation of “informational” SNPs (scenario (a)) from the SNPs for which the mother is heterozygous (scenarios (b) and (c)) would be 100% accurate .
[00167] In practice, the total number of molecules detected may be different for different SNPs. For each SNP locus, a predictable specific distribution curve can be constructed taking into account the total number of molecules detected in the maternal plasma sample that covers the SNP locus. In other words, the count cut to determine whether an SNP is informative or not can vary between SNPs and depends on the number of times the SNP locus has been counted.
[00168] The following table shows the allele counts of three SNP loci in maternal plasma for a sample of maternal plasma that was sequenced. For each of the three SNPs, two different alleles are detected in the maternal plasma sample. The total numbers of counts detected in the maternal plasma corresponding to these three SNPs are different.

[00169] The predicted distributions for the counts of the least abundant allele for a fractional fetal DNA concentration of 20% and different total counts of molecules that correspond to a SNP are shown in FIGS. 13A, 13B, and 14. The predicted distributions were plotted using an assumed 20% fetal DNA concentration because this represents the upper limit of fetal DNA concentration in the first trimester. The higher the concentration of fetal DNA, the more overlap is expected between the distribution curves of the smaller allele for which the mother is homozygous for the larger allele against that when the mother is heterozygous. Thus, it is more specific to derive cuts for smaller allele counts using a higher fetal DNA concentration for the prognosis of informational SNPs.
[00170] FIG. 13A shows a predictable distribution for the less abundant allele counts with a total number of 173 molecules and 20% fractional fetal DNA concentration. In one embodiment, based on this distribution, a cut-off criterion of less than 40 for the counts of the least abundant allele may be adequate to identify informational SNPs. Since the counts for the A allele are 10, the SNP n-1 locus is considered “informative” for calculating the fractional concentration of fetal DNA.
[00171] FIG. 13B shows a predictable distribution for the least abundant allele counts with a total number of 121 molecules and 20% fractional fetal DNA concentration. In one embodiment, based on this distribution, a cut-off value of less than 26 for the less abundant allele counts may be adequate to identify informational SNPs. Since the number of counts for the T allele is 9, the SNP n-2 locus is considered “informative” for calculating the fractional concentration of fetal DNA.
[00172] FIG. 12 shows a predictable distribution for the counts of the less abundant allele with a total number of 134 molecules and a fractional concentration of fetal DNA of 20%. In one embodiment, based on this distribution, a cut-off value of less than 25 for the counts of the least abundant allele may be adequate to identify informational SNPs. Since the number of counts for the T allele is 62, the SNP locus n-3 is considered “non-informative” and may not be used for calculating the fractional concentration of fetal DNA.
[00173] In some embodiments, using the equation f = 2 xp / (p + q), the fractional concentration of fetal DNA can be calculated using the allele counts for SNP 1 and 2 and combined. The results are shown below.
D. Determination of Depth of Depth of the Fetal Genome
[00174] In addition to obtaining a fractional concentration, the embodiments can determine a percentage coverage of the fetal genome that the analytical procedure (for example sequencing) in step 1010 performed. In some embodiments, informational loci can be used to determine the percentage of coverage. For example, any of the examples above can be used. In one embodiment, loci in which the fetus is a mandatory heterozygote can be used. In another embodiment, loci in which the fetus is determined to be heterozygous and the mother is homozygous can be used (for example using the 1100 method).
[00175] The fragments that have been mapped to the informational loci can be used to determine a coverage ratio. In one embodiment, a proportion of the loci of the first plurality of loci in which a respective first allele is detected from the sequencing results is determined. For example, if the fetus is TA in one locus and the mother is AA in the locus, then the T allele must be detected in the sequencing results if that locus has been sequenced. Thus, the proportion of the fetal genome that has been sequenced from the biological sample can be calculated based on this proportion. In one embodiment, the proportion of primary loci where the specific fetus allele is observed can be collected as the percentage coverage of the fetal genome. In other embodiments, the proportion can be modified based on where the loci are. For example, a percentage range can be determined for each chromosome. As another example, the percentage can be estimated to be less than the proportion if the primary loci do not form a good representation of the genome. As another example, a strip can be provided where the aspect ratio is one end of the strip. Although a high percentage, ie approximately 100%, means almost complete coverage of the fetal genome, most genetic diseases can be diagnosed with much less than 100% coverage, for example 80%, or 50%, or less. SAW. NO PREVIOUS INFORMATION OF MATERNAL AND PATERNAL GENOMA
[00176] In previous sections, some embodiments determined a genetic map of a fetus (or a portion of a genomafetal) when the mother's haplotypes and the father's genotypes are known. Other embodiments have demonstrated that the fractional concentration of fetal DNA can be determined by analyzing the DNA of the maternal plasma without prior knowledge regarding the genotypes of the mother, father, or fetus. In yet other embodiments, we now describe a method for determining the genetic map of a fetus (or a portion of a fetal genome) using RHDO analysis without prior information on the maternal and paternal genotypes / haplotype (s).
[00177] In one embodiment, information from reference haplotypes (for example common or known) of the population to which the parents belong is used. This information can be used to deduce the maternal and paternal haplotypes. An example is used to illustrate the principle of this method. Information regarding such reference haplotypes can be obtained, for example, from the International HapMap Project website (hapmap .ncbi .nlm.nih.gov /).
[00178] As part of an illustrative example, assume that three reference haplotypes (Hap A, Hap B and Hap C as shown in FIG. 15A) are present in the population. Each of these three haplotypes consists of 14 SNP loci and, for each locus, there are two possible alleles. In this example, the father has Hap B and Hap C while the mother has Hap A and Hap B, as shown in FIG. 15B. This example assumes that the fetus inherits Hap A from the mother and Hap C from the father. Therefore, the fetus has Hap A and Hap C, as shown in FIG. 15B.
[00179] FIG. 16 is a flow chart of a method 1600 for determining at least part of a fetal genome when a set of reference haplotypes are known, but parental haplotypes are not known, according to the embodiments of the present invention.
[00180] In step 1610, the maternal sample can be analyzed to identify SNPs in which the mother is homozygous and the fetus is heterozygous. This analysis can be done in a similar way as a determination of whether a locus is informative, as described above. Thus, in one embodiment, methods 1000 and / or 1100 can be used. In other embodiments described above, the maternal and paternal genomes can be analyzed to determine the information to carry out the fetal genome mapping.
[00181] FIG. 17 shows an example of determining informational loci for analyzing DNA fragments from a maternal sample. For each of the 14 loci, the counts of the two alleles for each locus are determined. The counts of these alleles can be determined, for example, but not limited to the use of real-time PCR, digital PCR, and massively parallel sequencing. For each of these loci, two different alleles would be detected in maternal plasma. Unlike these SNPs in which the mother is heterozygous, the proportion of the two alleles would be significantly different. The specific fetus allele (the allele that the fetus inherits from the father) would be much less abundant compared to the maternal allele. Informational loci 1710 are marked in FIG. 17.
[00182] In step 1620, one or more alleles of the paternal haplotype inherited by the fetus are deduced. In one embodiment, each of the 1710 loci can be used to determine the inherited paternal haplotype. For example, the paternal allele that the fetus inherited can be identified as the specific fetus allele for the 1720 loci because the specific fetus allele is the allele that is much less abundant than the maternal allele in the maternal sample.
[00183] In step 1630, the paternal alleles are compared with the reference haplotypes to determine the haplotype inherited from the father. In certain embodiments, several possible fetal haplotypes can be deduced, each with its own probability. One or more of the most likely fetal haplotypes can then be used for subsequent analysis, or for clinical diagnosis.
[00184] In the example shown in FIG. 18, there are three possible haplotypes (Hap A, Hap B and Hap C) in the population. From the analysis of maternal plasma, four SNPs were identified as being homozygous for the mother and heterozygous for the fetus, thus representing the paternal alleles that the fetus inherits. The genotypes in these four SNPs conform to the Hap C pattern. Therefore, the fetus inherited Hap C from the father, as shown in FIG. 19. In other words, for all SNPs within the same haplotype block, the paternal alleles that the fetus inherited can be deduced.
[00185] In step 1640, the loci (for example SNPs) in which the mother is heterozygous can be determined. In one embodiment, the analysis of the maternal sample can provide SNPs that the mother is heterozygous for. For example, in each of these SNPs, two different alleles can be detected in maternal plasma. Unlike SNPs in which the mother is homozygous and the fetus is heterozygous in that the specific allele of fetus contributes only a small proportion of the total alleles in maternal plasma, the counts of the two alleles would be similar for SNPs where the mother is heterozygous. Thus, the complete maternal genotype for all SNP loci within the haplotype block can be determined from the analysis of maternal plasma, for example, as shown in FIG. 20.
[00186] In step 1650, mathematic haplotypes are deducted from mathematic genotypes of step 1640 by comparing them with the genotypes in the loci for haplotype information of the relevant population. FIG. 21 shows an embodiment to determine the mathematical haplotypes of the mathematical genotypes and the reference haplotypes. In the example that is used, the mother is homozygous for the G allele in the third SNP locus. Since only Hap A and Hap B meet this criterion, this indicates that the mother has one of the three haplotype combinations, namely Hap A / HapA, Hap A / Hap B or Hap B / HapB. Furthermore, since the mother is heterozygous for A and C for the first SNP, we can deduce that the mother has the HapA / Hap B haplotype combination. In one embodiment, more than one possibility would result, and each possibility would be tested in the next step. From the analyzes above, the mother's haplotypes and the haplotype that the fetus inherits from the father were determined. FIG. 22 shows the mathematically determined haplotypes and the patently inherited haplotype.
[00187] In step 1660, the maternal haplotype inherited by the fetus is determined from the mathematical haplotypes identified in step 1650 and the patently inherited haplotype identified in step 1630. Using this information, an embodiment can use RHDO analysis to determine that the maternal haplotype is passed on to the fetus. An RHDO analysis can be performed according to any of the embodiments described here.
[00188] In one embodiment, for the analysis of RHDO, the SNPs in which the mother is heterozygous can be divided into two types, namely alpha type and beta type (for example as shown in FIG. 23 and as described above ). Alpha-type SNPs refer to those loci where the paternal allele passed to the fetus is identical to the maternal allele located in Hap A. For alpha-type SNPs, if the fetus inherits Hap A from the mother, the Hap A allele would be represented in excess in maternal plasma. On the other hand, if the fetus inherits Hap B from the mother, the two matele alleles would be equally represented in the maternal plasma.
[00189] Beta-type SNPs refer to those loci where the paternal allele passed to the fetus is identical to the maternal allele located in Hap B. For beta-type SNPs, if the fetus inherits Hap B from the mother, the Hap B allele would be represented in excess in maternal plasma. However, if the fetus inherits Hap A from the mother, the two kill alleles would be equally represented in the maternal plasma. The potential overrepresentation of the Hap A or Hap B alleles can be determined using RHDO analysis.
[00190] In some embodiments, to apply RHDO analysis to a particular region without prior information on the mathematically haplotypes and paternal genotypes, a relatively high range of SNPs within the haplotype block may be required, for example, 200 molecules that correspond to a SNP locus may need to be analyzed in one embodiment. This information can be obtained, for example but not limited to, by real-time PCR, digital PCR and massively parallel sequencing. In one embodiment, targeted sequencing (for example, by a combination of target enrichment and massively parallel sequencing) can be used to obtain unbiased and quantitative information on different alleles within the targeted region. An example below describes targeted sequencing. Therefore, this RHDO analysis can be applied to targeted DNA sequencing data from maternal plasma to determine which alleles / haplotypes are passed on to the fetus without prior information with respect to parental genotypes / haplotypes. VII. DETECTION OF MUTATION AGAIN
[00191] Some embodiments can detect a mutation that the fetus has acquired. A de novo mutation is a mutation that is not carried by the father or mother, but is produced, for example, during the gametogenesis of the father or mother or both. Such detection is of clinical use because de novo mutations play a significant role in causing various genetic diseases, for example hemophilia A and achondroplasia.
[00192] FIG. 24 is a flow chart illustrating a 2400 method of identifying a de novo mutation in the genome of an unborn fetus of a pregnant woman. The fetus having a father and a mother who is the pregnant woman, and the father having a paternal genome with two haplotypes and the mother having a maternal genome with two haplotypes, the method comprising:
[00193] In step 2410, a plurality of nucleic acid molecules from a biological sample obtained from the pregnant woman is sequenced. Note that the sample contains a mixture of mathematical and fetal nucleic acids.
[00194] In step 2420, a locus of each of the nucleic acid molecules sequenced in the human genome is identified. In one embodiment, sequence mapping can be performed by single-ended or cut-off sequencing. In one respect, mapping to the human genome to find a locus does not require an exact pairing of each of the nucleotides with a locus to be found.
[00195] In step 2430, for each of at least a portion of the loci, a maternal sequence and a paternal sequence are determined at the locus in question. For example, if 100 loci are determined in step 2420, then the maternal and paternal genomes in these 100 loci can be determined. In one embodiment, the paternal sequence is determined from a sample of the father as opposed to the use of reference haplotypes as described above. Thus, a mutation not in a reference genome can still be detected. In various embodiments, the maternal sequence can be obtained from a sample that only includes maternal DNA, or it can also be obtained from the biological sample, for example, using the methods described herein.
[00196] In step 2440, a first sequence in the plurality of nucleic acid molecules that is not present in the determined maternal or paternal sequence is identified. In one embodiment, a comparison of the first sequence with the given maternal or paternal sequence requires an exact match. Thus, if the pairing is not accurate, then the first sequence is considered not to be present in the given maternal or paternal sequence. In this way, even de novo mutations can be identified since a de novo mutation can be exactly a single nucleotide change. In another embodiment, a number of DNA fragments showing the non-maternal and non-paternal sequence are required for the sequence to be judged as a de novo mutation. For example, a section of 3 DNA fragments can be used to determine whether a sequence, ie the de novo mutation, is present or not.
[00197] In step 2450, a first fractional concentration of the first sequence in the biological sample is determined. For example, the number of DNA fragments that exhibit the first sequence can be expressed as a proportion of all detected DNA fragments from this locus.
[00198] In step 2460, a second fractional concentration of fetal nucleic acids in the biological sample is determined using a nucleic acid molecule that the fetus inherited from its father, and which is present in the paternal genome, but which is not present in the maternal genome . Such a nucleic acid molecule would contain a first allele at a locus where the father is homozygous and the mother is also homozygous, but for a different allele, and thus the fetus is a mandatory heterozygote. Informational loci as described above can be used to determine the nucleic acid molecule used to determine the second fractional concentration.
[00199] In other embodiments, the second fractional concentration can be determined using other methods, such as the use of PCR assays, digital PCR assays or assays based on mass spectrometry, on the Y chromosome, a panel of polymorphisms genetic, that is, single nucleotide polymorphisms, or insertion-deletion polymorphisms (Lun FMF et al Clin Chem 2008; 54: 1664-1672). Another alternative is to use one or more genomic loci that exhibit different DNA methylation between the fetus and the mother (Poon LLM et al. Clin Chem 2002; 48: 3541; Chan KCA et al. Clin Chem 2006; 52: 2211-2218 US patent 6,927,028).
[00200] In one embodiment, the different epigenetic situation is reflected by the different DNA methylation patterns. Different DNA methylation patterns may involve the IA family of the RAS association domain (RASSF1A) or the holocarboxylase synthase gene (biotin (proprionyl-Coenzyme A-carboxylase (which hydrolyzes ATP)) ligase (HLCS). DNA fragments with the fetus-specific DNA methylation profile can be expressed as a proportion of all DNA fragments that originate from the differentially methylated locus.
[00201] In step 2470, the first sequence is classified as a de novo mutation if the first and the second fractional concentrations are approximately the same. A non-maternal and non-paternal sequence that originates from errors in the analysis process, for example sequencing errors, is a random event and has a low probability of recurrence. Therefore, multiple fragments of DNA that exhibit the same non-maternal and non-paternal sequence in amounts similar to the measured fractional fetal DNA concentration for the sample are likely to be a de novo mutation present in the fetal genome rather than arising from a sequencing error. . In one embodiment, a cut-off value can be used to determine whether the fractional concentrations are the same. For example, if the concentrations are within a specified value with each other, then the first sequence is classified as a de novo mutation. In various embodiments, the specified value can be 5%, 10%, or 15%. EXAMPLES I. EXAMPLE 1
[00202] To illustrate the embodiments of the present invention, the following case has been analyzed. A couple, who attended an obstetric clinic for the prenatal diagnosis of beta-thalassemia, were recruited. The father was a carrier of the four-base-CTTT deletion of codons 41/42 of the human beta-globin gene. The pregnant mother carried the A -> G mutation in nucleotide -28 of the human beta-globin gene. Blood samples were collected from the father and mother. For the mother, the blood sample was collected before sampling the chorionic villus (CVS) at 12 weeks of gestation. After CVS, a portion was stored for the experiment. One objective of the experiment was to construct a genetic map of a broad genome or to determine the partial or complete genomic sequence of the fetus by massively parallel sequencing of maternal plasma DNA. 1. Determination of parental genotypes
[00203] The DNA was extracted from the white cell layers of the father and mother, and from the CVS sample. These DNA samples were individualized for analysis by the Affymetrix Genome-Wide Human SNP Array 6.0 system. This system features 1.8 million genetic markers, including 900,000 single nucleotide polymorphisms (SNPs) and more than 950,000 probes for detecting copy number variations. The absolute number and percentages of SNPs showing different genotype combinations for the father, mother and fetus (CVS) are shown in the table in FIG. 25A.
[00204] Although the Affymetrix system was used in this example, in practice, any genotyping platform known to those skilled in the art could be used. In fact, apart from genotyping, the DNA of the white cell layer of the father and mother can also be individualized for sequencing, on an entire genome basis or for selected genomic regions. In addition, any source of constitutional DNA (eg oral cell DNA, hair follicle DNA, etc.) from the father and mother could be used to establish the parental genotypes.
[00205] The CVS sample was analyzed to provide a standard for comparison with the fetal genetic map deduced from the analysis of the plasma plasma. In addition, for this experiment, the CVS sample genotype can also be used to construct the mother's haplotype for RHDO analysis. In this scenario, the use of the CVS genotype for such a haplotype construction purpose was only used for illustration purposes. In a clinical application of embodiments, the maternal haplotype can be constructed by analyzing other individuals in the family, for example, a previous progeny, a sibling, the parents or other relatives of the mother. The mathematical haplotypes of the chromosomal regions of interest can also be constructed by other methods well known to those skilled in the art, some of which are mentioned here.
[00206] For selected embodiments, the haplotype of the father of the unborn fetus to be analyzed can also be determined. This information can be particularly useful for the determination of the relative haplotype for chromosomal regions where both the father and the mother are heterozygous. 2. Massively parallel sequencing of maternal plasma DNA
[00207] Plasma DNA obtained from the mother was individualized for massively parallel sequencing using the Illumina Genome Analyzer platform. Sequencing of the cut end of the plasma DNA molecules was performed. Each molecule was sequenced at each end by 50 base pairs, thus totaling 100 base pairs per molecule. The two ends of each sequence were aligned to the repeated unmasked human genome (Hg 18 NCBI.36 downloaded from UCSC http://genoma.ucsc.edu) using the SOAP2 program from the Beijing Genomics Institute in Shenzhen (soap.genomics.org. cn) (Li R et al.Bioinformatics 2009, 25 (15): 1966-7) The table, FIG. 25B, lists the alignment statistics for the first 20 flow cells. Thus, with 20 flow cells, more than 3,932 billion readings were aligned with the reference human genome. 3. Calculation of fractional concentrations of fetal DNA
[00208] As mentioned above, the fractional concentration of fetal DNA in the maternal plasma sample can be calculated from the sequencing data. One way was to analyze SNPs in which the father and mother were both homozygous, but for different alleles from each other. For such SNPs, the fetus would be a mandatory heterozygote for a patently inherited and a mathematically inherited allele. In one embodiment, any of the calculation methods described in section V can be used. In this example, calculations were performed on the cumulative data across different polymorphic genetic loci that satisfied the parental genotypic configuration (that is, both parents being homozygous, but for different alleles) on different chromosomes. The fractional concentrations of fetal DNA calculated for SNPs located on different chromosomes are listed in the rightmost column of FIG. 26. As can be seen from the table, the fractional concentrations determined for SNPs located on different chromosomes correlated very closely with each other.
[00209] As a quality control experiment, SNPs in which the mother was homozygous and the father was heterozygous were also investigated from the Affymetrix SNP 6.0 analysis of the white cell layer samples (central column of FIG. 26). It can be seen that at sufficient depth of DNA sequencing, the fractional concentrations of fetal DNA measured from this analysis were very similar to those measured for SNPs in which both father and mother were homozygous but for different alleles.
[00210] In one implementation, when the almost concordance of fractional concentrations of fetal DNA was observed from these two types of SNPs, it could be concluded that the scope of fetal genome sequencing was close to complete. In one aspect, at a lesser depth of coverage, the fractional concentrations of DNAfetal measured for SNPs in which the mother was homozygous and the father was heterozygous would be superior to those measures for SNPs in which both the father and the mother were homozygous, but for alleles many different. At such a shallow depth of coverage, the absence of a patently unique allele from the sequencing results may have two possible causes: (i) that the fetus did not inherit this allele from the father; and / or (ii) that the fetus inherited this allele from the father, but then this allele was lost from the sequencing results because the sequencing depth was not sufficient. 4th. Calculation of the percentage coverage of the fetal genome
[00211] Also, as mentioned above, the percentage of the fetal genome that was analyzed by DNA sequencing from the maternal plasma could be determined by considering the subset of SNPs in which the father and mother were both homozygous, but for different alleles. In this family, 45,900 SNPs in the Affymetrix SNP 6.0 array belonged to this subset. The percentage coverage of the fetal genome could be deduced by analyzing the plasma DNA sequencing data to see at what percentage of this subset of SNPs a fetal allele could be detected by sequencing.
[00212] The plot in FIG. 27A illustrates the observed percentage of SNPs in this subset in which a fetal allele could be observed from the sequencing data for the first 20 flow cells analyzed. Thus, a fetal allele could be observed in 94% of such SNPs. This degree of sequencing corresponded to more than 3,932 billion readings, each with 100 base pairs of sequence. The plot in FIG. 27B shows coverage vs. the number of readings, rather than the number of flow cells. With the increase in throughput from different sequencing platforms, it is expected that the number of flow cells or runs that would be used or required to generate this number of sequence readings or sequence length will decrease in the future.
[00213] In some embodiments, as multiple SNPs have been detected in each chromosomal region or chromosomes, the scope of the fetal genome can be much lower than 94% while still providing an accurate genomic mapping. For example, it is assumed that there are 30 informational SNPs in a chromosomal region, but a fetal allele is detected with only 20 SNPs out of the 30 SNPs.
[00214] However, the chromosomal region can still be accurately identified with the 20 SNPs. Thus, in one embodiment, equivalent accuracy can be achieved with a coverage of less than 94%. 4b. Scope of the genetic allele map that the fetus inherited from its father
[00215] This illustrative analysis focuses on SNP alleles in which the father was heterozygous and the mother was homozygous. In this family, 131,037 SNPs on the Affymetrix SNP 6.0 platform belonged to this category. A subset of these SNPs consisted of 65,875 SNPs in which the mother was homozygous, while the father and the fetus were both heterozygous. With the use of 20 flow cells, the patently inherited alleles could be observed in 61,875 of these SNPs, indicating a 93.9% range. This last percentage adapted well with the percentage coverage data deduced in the previous paragraph. The correlation between the range of patently inherited alleles and the number of mappable sequence readings and the number of flow cell sequences are shown in FIG. 28A and FIG. 28B, respectively.
[00216] In order to elucidate the specificity of this method to detect genuine patently inherited fetal alleles, the 65,162 (ie 131,037 - 65,875) SNPs in which the fetus inherited alleles that were the same as those possessed by the mother were analyzed. For such SNPs, the apparent detection of alleles other than those possessed by the mother would represent a false positive. Thus, among 65,162 SNPs, only 3,225 false positives (4.95%) were observed when 20 flow cells were analyzed. These false positives may be the result of sequencing errors or genotyping errors in the DNA of the father or mother, or mutations again in the fetus. The correlation between the false positive rate and the number of flow cells sequenced is shown in FIG. 29A.
[00217] The rate of false positives can also be estimated considering the subset of SNPs that both father and mother were homozygous and with the same allele. The presence of any alternative allele at the particular locus was considered to be a false positive. These false positives may be the result of sequencing errors or DNA genotyping errors of the father or mother, or de novo mutations in the fetus. There were 500,673 SNPs in this subset. With the data sequence of 20 flow cells, false positive results were detected in 48,396 SNPs (9.67%). The correlation between the false positive rate and the number of sequenced flow cells is shown in FIG. 29B. This false positive rate was higher than the estimate used in the subset of SNPs that the mother and the fetus were homozygous and the father was heterozygous. This is because, in the last subset of SNPs, only the presence of the patently inherited allele in maternal plasma is considered to be a false positive whereas, in the first subset, any allele other than the common allele shared by the father and mother is considered as a false positive result.
[00218] FIG. 30 shows the scope of fetal-specific SNPs for different numbers of flow cells analyzed. SNPs that both father and mother were homozygous, but with different alleles, are included in this analysis. The X axis is the range of times for fetus-specific SNPs, and the Y axis is the percentage of SNPs with the specified range of times. With the increase in the number of flow cells being analyzed, the average number of times of coverage for fetus-specific SNPs increases. For example, when a flow cell was analyzed, the average SNP range was 0.23 times. The mean range increased to 4.52 times when 20 flow cells were analyzed. 5. Accuracy of a genetic map inherited from your mother
[00219] FIG. 31 shows the accuracy of Type A analysis when data from 10 flow cells were used. Section II.B describes how to perform a Type A and Type B analysis (also referred to as alpha and beta). Accuracy is for the correct determination of the haplotype that was inherited from the mother. Precision is shown separately for each chromosome.
[00220] Using a probability ratio of 1,200 for SPRT analysis (Zhou W et al. Nat Biotechnol 2001; 19: 78-81; Karoui NE et al. Statist Med 2006; 25: 3124-33), accuracy varied from 96% to 100%. As shown, even with such a high probability ratio for the SPRT classification, a total of 2,760 segments across the genome could be classified. This degree of resolution is sufficient for most purposes, when considering that meiotic recombination occurs at the frequency of one to a low single digit number per targeted chromosome per generation. In addition, it could be seen that all bad classifications would be avoided when the interlacing method was used (right side of FIG. 31). As described above, the interlacing method uses both Type A and Type B analysis.
[00221] FIG. 32 shows the accuracy of Type B analysis when data from 10 flow cells were used. Using a probability ratio of 1,200 for the SPRT analysis, the accuracy ranged from 94.1% to 100%. All bad classifications could be avoided when the interlacing method was used (right side of FIG. 32), as noted in FIG. 31.
[00222] FIG. 33 shows the accuracy of Type A analysis when data from 20 flow cells were used. Using a probability ratio of 1,200 for the SPRT analysis and the “two consecutive blocks” algorithm, a total of 3,780 classifications were made and only 3 (0.1%) of the classifications were incorrect. FIG. 34 shows the accuracy of Type B analysis when data from 20 flow cells were used. Using a probability ratio of 1,200 for the SPRT analysis and the “two consecutive blocks” algorithm, a total of 3,355 classifications were made and only 6 (0.2%) classifications were incorrect. In these examples, SPRT is performed using several genetic markers, such as SNPs. II. PRENATAL DETERMINATION OF BETA-THALASSEMIA RISK
[00223] In one embodiment, to determine the risk of the fetus having beta-thalassemia (an autosomal recessive disease) it can be determined whether the fetus inherited mutant alleles carried by its father and mother. In this case mentioned above, the father is a carrier of the deletion of the 4 base pairs - CTTT of codons 41/42 of the human beta-globin gene. The pregnant mother carried the A -> G mutation in nucleotide -28 of the human beta-globin gene.
[00224] To determine whether the fetus inherited the paternal 41/42 codon mutation, DNA sequencing data from maternal plasma, using the first 10 flow cells, were searched for this mutation. A total of 10 readings with this mutation were discovered (FIG. 35A). Consequently, the fetus inherited the paternal mutation. In addition, 62 readings were found to contain the wild type sequence at codons 41/42 (FIG. 35B). Thus, the percentage of readings in this region containing the mutation is 0.1389. This figure is very close to the fractional concentration of fetal DNA determined in FIG. 26. In one embodiment, the risk of the fetus inheriting the paternal mutation can also be determined by elucidating its inheritance from genetic polymorphisms linked to the paternal mutation.
[00225] In one embodiment, to determine the risk that the fetus inherited the maternal -28 mutation, the RHDO analysis was performed. In this family, the -28 mutation was located in haplotype IV while the wild type allele was located in haplotype III. The results of the RHDO Type A analysis are shown in FIG. 36 whereas those from the RHDO Type B analysis are shown in FIG. 37. In both types of analysis, the fetal inheritance of the mother's haplotype III was deducted. In other words, the fetus inherited the mother's wild type allele. The final diagnosis of the fetus was that it inherited the codons of the 41/42 mutation from the father and a wild-type allele from the mother. Thus, the fetus is a heterozygous carrier of beta-thalassemia and must therefore be clinically healthy. III. TARGET ENRICHMENT AND TARGETED SEQUENCING
[00226] As discussed in the previous sections, the precision of the estimation of the fractional concentration of fetal DNA and the resolution of the genetic map deduced from the analysis of the DNA of the maternal plasma may depend on the depth of coverage of the loci of interest. For example, we demonstrated that a total of 200 molecules that correspond to a SNP locus would be required to determine, with high precision, the fractional concentration of fetal DNA without previous information on the maternal genotype. Allele counts for a SNP in maternal plasma can be obtained, for example, but not limited to, by real-time PCR, digital PCR and massively parallel sequencing.
[00227] As massively parallel sequencing of maternal plasma DNA can simultaneously determine allele counts for millions of SNPs across the entire genome, it is an ideal platform for broad genome analysis across different loci. The basic format of massively parallel sequencing allows different regions within the genome to be covered at similar depths. However, in order to sequence a region of particular interest at high sequencing depth using massively parallel random sequencing, the remaining parts of the genome (not intended to be analyzed) have to be sequenced to the same degree. Thus, this method can be expensive. To improve the cost effectiveness of the massively parallel sequencing method, one way is to enrich the target region before proceeding with sequencing. Targeted sequencing can be performed by solution phase capture (Gnirke A, et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol 2009; 27: 182-9), microarray capture (for example using the NimbleGen platform) or targeted amplification (Tewhey R, et al. Microdroplet-based PCR enrichment for large-scale targeted sequencing. Nat Biotechnol 2009; 27: 1025-31).
[00228] Targeted sequencing was initially applied by detecting variations in the genetic population, for example for studies of genetic association. Therefore, its current application in genomic research is intended to solve qualitative problems (for example genotyping or mutation detection). However, the application of targeted sequencing in the DNA of maternal plasma for purposes of non-invasive prenatal diagnosis involves quantitative considerations, the practicality of which have been uncertain. For example, the use of targeted sequencing would introduce quantitative trends in the detection of fetal and maternal DNA in maternal plasma. In addition, previous work has shown that fetal DNA is shorter than maternal DNA (Chan KCA et al.Size distributions of maternal and fetal DNA in maternal plasma. Clin Chem 2004; 50: 88-92). This size difference can also introduce quantitative trends or differential efficiency in capturing fetal and maternal DNA in maternal plasma. Nor were we sure about the efficiency with which such fragmented DNA molecules would be captured. In the following descriptions, we demonstrate that targeted sequencing can be achieved by enriching the target followed by massively parallel sequencing. We also show that target enrichment is an efficient way to estimate fractional concentration of fetal DNA
compared to that of the entire genome sequentially. A. Determine Fractional Concentration Using Target Enrichment 1. Materials and Methods
[00229] Four pregnant women (M6011, M6028, M6029 and M6043) with singleton female fetuses were recruited. Peripheral maternal blood samples were collected in blood tubes with EDTA prior to elective cesarean section in the third trimester, while placenta samples were collected after elective cesarean section. After centrifugation, DNA from peripheral blood cells was extracted using the Blood Mini Kit (Qiagen). The 2.4 ml plasma DNA was extracted by the DSP DNA Blood Mini Kit (Qiagen). Maternal genomic DNA was extracted from the white cell layer and fetal genomic DNA was extracted from placental tissues. Third quarter samples were used in this example for illustration purposes only. Samples from the first and second quarters can also be used.
[00230] Maternal and fetal genotypes were determined by the Broad Genome 6.0 Human SNP Array (Affymetrix). 5 to 30 ng of plasma DNA for each case was used for the construction of the DNA library by the end-cut sample preparation kit (Illumina) according to the protocol of the sample preparation manufacturer Chromatin Immunoprecipitation Sequencing. The DNA attached to the adapter was directly purified using spinning columns provided in a QIAquick PCR purification kit (Qiagen), with no other size selection. The DNA attached to the adapter was then amplified using a 15-cycle PCR with standard primers. The primers were PCR Primer PE 1.0 and 2.0 from Illumina. The DNA libraries were quantified using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies) and conducted in a Bioanalyzer 2100, using a DNA 1000 kit (Agilent), to check for size distribution. 0.6 to 1 pg of an amplified plasma DNA library was generated for each sample at an average size of about 290 base pairs. The capture library was obtained from Agilent and covered 85% of the exons in human chrX (catalog number: 5190-1993). For all four cases in this study, 500 ng of the amplified plasma DNA library for each case was incubated with capture probes for 24 hours at 65 ° C, according to the manufacturer's instructions. After hybridization, the captured targets were selected by pulling the biotinylated probe / target hybrids using magnetic beads coated with streptavidin (Dynal DynaMag-2 Invitrogen), and purified with the MinElute PCR Purification kit (Qiagen). Finally, the targeted DNA libraries were enriched by amplification by 12-cycle PCR with Agilent's SureSelect GA PE primers. PCR products were purified by the QIAquick PCR Purification Kit (Qiagen). The pre-cut DNA libraries with or without target enrichment were then individualized for massively parallel random sequencing using the Illumina Genome Analyzer IIx. A sequence line in a standard flow cell was used to sequence each DNA library. 2. Fractional concentration of fetal DNA without target enrichment
[00231] The fractional concentration of fetal DNA can be calculated based on the allele counts of the informational SNPs (ie SNPs that the mother is homozygous for and the fetus is heterozygous). The table below shows that informational SNPs 120184, 110730, 107362 and 110321 were identified across the entire genome for the four cases, while 63, 61, 69 and 65 (respectively in the same order) fell within the targeted region on the X chromosome. Without target enrichment, fractional concentrations of fetal DNA were 33.4%, 31.3%, 29.2% and 34.4% based on data from all informational SNPs in the genome.

3. Comparison of samples with and without target enrichment
[00232] In some embodiments, the depth of the sequence span represented the average number of times that each base was sequenced in a particular region. In this embodiment, we calculate the sequence depth of the targeted region by dividing the total number of sequenced bases within the targeted region by the length of the targeted region (3.05 Mb). For the regions covered by the enrichment kit, the average sequence coverage was 0.19 times for non-enriched samples and 54.9 times for enriched samples, indicating an average enrichment of 289 times. At this depth of sequencing, only 4.0% of specific fetus alleles within the targeted region were detected prior to target enrichment (see table below). In comparison, 95.8% of these became detectable after target enrichment (see table below). Therefore, target enrichment greatly increased the rate of detection of specific fetus alleles within the targeted region.
[00233] Then, we compared the fractional concentrations of fetal DNA based on the read counts of all informational SNPs within the targeted region for each sample, with and without enrichment. Without target enrichment, the number of specific fetus readings ranged from 0 to 6 for the four samples (see table below). Due to the low sequence span, inadequate sampling of fetal DNA molecules would prevent an accurate estimate of fractional concentration of fetal DNA. With target enrichment, a much larger number of fetal specific allele counts (511-776) and shared allele counts (25703922) within the targeted region were observed (see table below). The percentages of fetal DNA were calculated as 35.4%, 33.2%, 26.1% and 33.0%, compatible with the fetal DNA percentages estimated by the broad genome data in the unenriched samples (see table below) . These results indicated that the maternal and fetal DNA molecules were enriched to a similar degree within the targeted region.
B. Fetal Genome Determination Using Target Enrichment
[00234] An application of an RHDO method is for the non-invasive prenatal detection of mathematically inherited genetic diseases. Using massively parallel sequencing of maternal plasma without target enrichment, RHDO analysis can accurately determine which maternal haplotype is passed to the fetus with an average of 17 SNPs when the DNA sequencing depth of maternal plasma is approximately 65 times the comprehensiveness of the human genome. To improve the cost effectiveness of this method, selective targeting of the sequencing to specific regions of interest within the genome and then applying an RHDO analysis to the sequencing data can be performed. As an example, we demonstrate the concept of using targeted sequencing and RHDO analysis of the X chromosome. However, targeted sequencing and RHDO analysis can also be applied to all chromosomes, for example to autosomes. In one embodiment, an RHDO analysis as described above can be used for the targeted embodiments.
[00235] Five (PW226, PW263, PW316, PW370 and PW421) pregnant women with singleton male fetuses were recruited. Peripheral maternal blood samples were collected in EDTA blood tubes prior to sampling the chorionic villus in the first trimester. After centrifugation, DNA from peripheral blood cells was extracted using the Blood Mini Kit (Qiagen). The 3.2 ml plasma DNA was extracted by the DSP DNA Blood Mini Kit (Qiagen). Maternal genomic DNA was extracted from the white cell layer and fetal genomic DNA was extracted from chorionic villi. The samples were pre-cut and analyzed as described above. Each sample was then randomly sequenced using a line in an Ulumina flow cell.
[00236] In this example, we use the fetal genotype, along with the mother's nucleic acid sequencing information, to deduce the mathematical haplotypes for the X chromosome and to deduce which haplotype was inherited from the mother. For each SNP on the X chromosome that the mother was heterozygous for (ie, an informational SNP), the allele that was inherited by the fetus is defined as originating from the haplotype 1 mathema (Hap I) whereas the maternal allele that was not passed on the fetus was defined as originating from haplotype 2 matemo (Hap II). In some embodiments, for clinical applications, the fetal genotype may not be available beforehand and the mathematical haplotypes can be determined or deduced by methods well known to those skilled in the art and methods described herein. The X chromosome is used here for illustration purposes only. Other chromosomes, for example autosomes, can also be used in such an analysis.
[00237] For the five cases described here, they were all carrying a male singleton fetus. Since a male fetus only inherits an X chromosome from the mother but no X chromosome from the father, the maternal X chromosome that was passed on to the fetus would be represented in excess in the maternal plasma. The RHDO analysis was performed from the pter to qter of the X chromosome. Starting with the SNP closest to the pter of the X chromosome, the SPRT analysis can determine whether the allele in Hap I or Hap II was statistically significantly represented in excess in plasma maternal. If neither haplotype was statistically significantly over-represented, the allelic counts for the next SNP can be combined for another SPRT analysis. Additional SNPs can be combined for analysis until the SPRT process has identified one of the haplotypes as having a statistically significantly over-represented node. The classification process can then be restarted at the next SNP.
[00238] FIGS. 38A and 38B show the results of SPRT classification for the case PW226 as an example. There were a total of nine successful SPRT classifications for the X chromosome in this case. For each SPRT classification, the Hap I alleles were shown to be represented in excess in the maternal plasma sample, indicating that the fetus inherited Hap I from the mother. As we defined Hap I as the haplotype containing the alleles passed to the fetus, the results of all these SPRT classifications were corrected.
[00239] The results of the RHDO analysis for the five cases are summarized in FIG. 39. The number of successful SPRT classifications ranged from 1 to 9. All SPRT classifications have been corrected. A higher fractional fetal DNA concentration was associated with a higher number of classifications. This is because the allelic imbalance due to the presence of fetal DNA can be detected more easily when the fractional concentration of fetal DNA is higher. Therefore, fewer SNPs may be required to achieve a successful RHDO classification. The defined chromosomal region (s) can therefore be divided into more RHDO blocks. Our results confirm that the RHDO analysis can be performed on the massive sequencing data that are obtained after target enrichment.
[00240] Our data also showed that the targeted method is a more cost-effective method of performing RHDO analysis. Without target enrichment, for samples with similar fetal DNA concentrations, sequencing for approximately 5 flow cells (ie 40 sequencing lines) was required (FIG. 40) to achieve the average depth obtained for the samples shown in FIG . 39. Here we show that with target enrichment, sequencing in just one line already reaches the average sequencing depth of some 15 to 19 times for successful RHDO classification. Alternatively, an even higher level of sequencing coverage can be achieved with relatively little additional cost when target enrichment is used. The highest level of scope of sequencing can effectively reduce the size of the genomic region required for successful RHDO classification and consequently improve the resolution of the analysis. IV. TARGET ENRICHMENT
[00241] It has been known since 2004 that circulating fetal DNA molecules are generally shorter than maternal DNA in maternal plasma (Chan KCA et al Clin Chem 2004; 50: 88-92; Li et al Clin Chem 2004). However, the molecular basis of this observation has remained unresolved. In our current study, we generated 3,931 x 109 readings in the study plasma sample and used 1 base pair bins in our bioinformatics analysis. The size of each plasmid DNA molecule sequenced was deduced from the coordinates of the genome at the ends of the cut end readings.
[00242] For this analysis, we focus on single nucleotide polymorphisms (SNPs) in which the father and mother were both homozygous, but for a different allele. For such SNPs, the fetus was a mandatory heterozygote. The allele for each SNP that the fetus inherited from the father could be used as a specific fetus marker. The sizes of the fetal sequence (using the specific alleles of a patently inherited fetus) and the total sequence were determined for the entire genome (FIG. 41) and individually for each chromosome (FIG. 42A-42C).
[00243] We observed that most of the significant differences between fetal and maternal DNA in maternal plasma is the reduction in the 166 base pair peaks, compared to the 143 base pair peaks (FIG. 41). The most abundant total sequence (predominantly maternal) was 166 base pairs in length. The most significant difference in the size distribution between fetal and total DNA was that fetal DNA exhibited a peak reduction of 166 base pairs (Fig. 41) and a relative peak prominence of 143 base pairs. The latter probably corresponded to the cutting of a -20 base pair fragment from a nucleosome to its -146 base pair nucleus particle (Lewin B, in Gene IX, Jums and Bartlett, Sudbury, 2008, pp. 757- 795).
[00244] From approximately 143 base pairs and below, both fetal and total DNA distributions demonstrated a remaining periodicity of 10 base pairs of nucleosomes cleaved with nuclease. These data suggest that the plasma DNA fragments are derived from apoptotic enzymatic processing. In contrast, the size analysis of readings that mapped to the mitochondrial genome not linked to histone did not show this nucleosome pattern (FIG. 41). These results provide a previously unknown molecular explanation for the known size differences between fetal and maternal DNA using the Y chromosome and selected polymorphic genetic markers (Chan KCA et al Clin Chem 2004; 50: 88-92; Li et al Clin Chem 2004 ; 50: 1002-1011; US Patent Application 20050164241; US Patent Application 20070202525), and have shown that such size differences exist across the entire genome. The most likely explanation for this difference is that the circulating fetal DNA molecules consist of more molecules in which the base-20-pair linker fragment has been cut from a nucleosome.
[00245] Given these observations, there are several ways in which the sample can be enriched for fetal DNA. In one embodiment, reagents can be used that would preferentially bind to the linker fragment. Such reagents would be expected to bind preferentially to DNA derived from the mother when compared to DNA derived from the fetus in maternal plasma. An example of such reagents is an antibody. One target of such an antibody is one that binds to histone H1. Histone H1 is known to bind to the linker fragment. An application of such an antibody is to carry out the enrichment of fetal DNA by negative selection, that is, via the preferential immunoprecipitation of DNA mathematically derived in maternal plasma that contains the ligand fragment containing histone Hl. In addition, Hl is known to have several variants, some of which exhibit specific tissue variation in expression (Sancho M et alPLoS Genet 2008; 4: el000227). These variants would be further explored to differentiate between fetal (predominantly placental) and maternal (predominantly hematopoietic DNA (Lui YYN et al Clin Chem 2002; 48: 421-427). For example, one can target a variant of Hl histone that is predominantly expressed by trophoblastic cells to preferentially and positively select for DNA derived from the fetus in maternal plasma.This strategy can also be applied to other histone proteins or other nucleosomal proteins that exhibit specific tissue patterns, especially trophoblast specific, of expression.
[00246] Given the pronounced peak of 166 base pairs for maternal DNA, another possibility to enrich fetal DNA is to plan a system for the negative selection of DNA fragments that are 166 + 2 base pairs in length. For example, a system based on capillary electrophoresis or high-performance liquid chromatography would allow accurate size measurement and separation of DNA molecules. Another method for negative selection is to do this in a virtual environment during the bioinformatics analysis of the sequencing data.
[00247] Like other species of DNA in plasma, for example tumor DNA (Vlassov VV et al. Curr Mol Med 2010; 10: 142-165) and transplanted organ DNA (Lo YMD et alLancet 1998; 351: 1329-1330 ), it is also expected to share such characteristics with fetal DNA in maternal plasma, the strategies listed in (1) and (2) above would also be used for the enrichment of these DNA species.
[00248] According to an embodiment, a method for differential enrichment of DNA species in human plasma or serum by targeting the nucleosome-binding fragment is provided. In one embodiment, enrichment is done by removing one of the following: mathematically derived DNA or hematopoietic cell derived DNA. In another embodiment, the targeting involves a reagent (such as an antibody or another type of protein) that would preferentially bind to one of the protein or nucleic acid components of the nucleosome binding fragment. In another embodiment, the targeting reagent will selectively bind histone H1 or another protein that binds to the nucleosome binding fragment. In another embodiment, the targeting reagent will bind to the maternal or hematological variants of histone Hl or another protein that binds to the nucleosome binding fragment. In one embodiment, DNA removal is accomplished by immunoprecipitation or attachment to a solid surface.
[00249] According to another embodiment, a method for differential enrichment of fetal DNA in maternal plasma or serum includes: (a) use of an antibody that would bind to one or more components of the binding fragment of nucleosome; (b) removing the unbound fraction by immunoprecipitation or capturing to a solid surface; and (c) harvesting the unbound fraction that contains an increased fractional concentration of fetal DNA.
[00250] Any of the software components or functions described in this application, can be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C ++ or Perl using, for example, conventional or object-oriented techniques. The software code can be stored as a series of instructions, or commands in a computer-readable medium for storage and / or transmission, suitable media include random access memory (RAM), a read-only memory (ROM), a magnetic medium such as a fixed disk or a floppy disk, or an optical medium such as a compact disk (CD) or DVD (digital versatile disk), flash memory, and others. The computer-readable medium can be any combination of such storage or transmission devices.
[00251] Such programs can also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and / or wireless networks that conform to a variety of protocols, including the Internet. As such, a computer-readable medium according to an embodiment of the present invention can be created using a data signal encoded with such programs. Computer-readable media encoded with the program code can be packaged with a compatible device or supplied separately from other devices (for example, via download from the Internet). Any such computer readable medium may reside in or within a single computer program product (for example, a fixed disk or an entire computer system), and may be present in or within different computer program products within a system or network. A computer system can include a monitor, printer, or other suitable display to provide any of the results mentioned here to a user.
[00252] An example of a computer system is shown in FIG. 43. The subsystems shown in FIG. 43 are interconnected via a 4375 bus system. Additional subsystems such as a 4374 printer, keyboard 4378, fixed disk 4379, monitor 4376, which is connected to the display adapter 4382, and others are shown. Peripherals and input / output (I / O) devices, which connect to the 4371 I / O controller, can be connected to the computer system by any number of means known in the art, such as the 4377 serial port. serial port 4377 or external interface 4381 can be used to connect the computer device to a wide area network such as the Internet, a mouse input device, or a scanner. The system via bus interconnection allows the 4373 central processor to communicate with each subsystem and to control the execution of memory instructions from the 4372 system or the fixed disk 4379, as well as the exchange of information between subsystems. The system memory 4372 and / or the fixed disk 4379 can incorporate a computer-readable medium. Any of the values mentioned here can be output from one of the components to another of the components and can be output to the user.
[00253] A computer system can include a plurality of the same components or subsystems, for example, connected together by the external interface 4381 or by an internal interface. In some embodiments, computer systems, subsystems, or devices can communicate over a network. In such cases, one computer can be considered a client and another computer a server, where each can be part of the same computer system. A client and a server can each include multiple systems, subsystems, or components.
[00254] The specific details of particular embodiments can be combined in any suitable or varied manner from those shown and described herein without departing from the spirit and scope of the embodiments of the invention.
[00255] The above description of exemplary embodiments of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the exact form described, and many modifications and variations are possible considering the above disclosure. The embodiments have been chosen and described in order to better explain the principles of the invention and their practical applications to thereby enable others skilled in the art to better use the invention in various embodiments and with various modifications as are appropriate for the particular use considered.
[00256] All publications, patents, and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.
权利要求:
Claims (30)
[0001]
1. Method for determining at least a portion of the genome of an unborn fetus of a pregnant female, the fetus having a father and a mother who is the pregnant female, and the father having a paternal genome with paternal haplotypes and the mother having a maternal genome with mathematical haplotypes, characterized by the fact that it comprises: (a) analyzing a plurality of nucleic acid molecules from a biological sample obtained from the pregnant female, where the biological sample contains a mixture of mathematical and fetal nucleic acids and in that analyzing a nucleic acid molecule includes: (i) receiving results from at least one technique selected from the group consisting of massively parallel sequencing, microarray, hybridization, PCR, digital PCR, and mass spectrometry implemented in the nucleic acid molecule; ( ii) identify, using the results, a locus of the nucleic acid molecule in the human genome; and (iii) determining, using the results, a respective allele of the nucleic acid molecule; (b) determining a paternal allele inherited by the father's fetus in each of a first plurality of loci, where the maternal genome is heterozygous in the first plurality of loci; (c) determining each of two mathematical haplotypes of the first plurality of loci; (d) based on the determined alleles of the nucleic acid molecules, determining, with a computer system, quantities of the respective alleles in each of the first plurality of loci; (e) comparing relative amounts of the respective alleles of the nucleic acid molecules in more than one locus of the first plurality of loci, where comparing the relative quantities uses a cutoff value to determine whether one of the two haplotypes we kill is over-represented, equally represented or under-represented in relation to the other maternal haplotype; and (f) based on the relative representation of the two killer haplotypes determined from the comparison and considering the paternal alleles inherited by the fetus, determine which of the two killer haplotypes is inherited by the unborn fetus of the mother in the portion of the genome covered by the first plurality of loci.
[0002]
2. Method according to claim 1, characterized in that the relative amounts include a size distribution of the nucleic acid molecules.
[0003]
3. Method, according to claim 1, characterized by the fact that (c) the determination of each of the two mathematical haplotypes of the first plurality of loci is based on the analysis of the plurality of nucleic acid molecules of a biological sample.
[0004]
4. Method according to claim 1, characterized by the fact that (b) the determination of the paternal allele inherited from the father in each of the first plurality of loci includes: (i) determining a second plurality of loci of the paternal genome that be heterozygous, and in which the maternal genome is homozygous in the second plurality of loci; (ii) identify, in the plurality of nucleic acid molecules, alleles that are present in the paternal genome in those of the second plurality of loci and absent in the maternal genome; ( iii) identify the inherited paternal haplotype as the haplotype with the identified alleles; and (iv) use the inherited paternal haplotype to determine the allele inherited from the father in the first plurality of loci.
[0005]
5. Method, according to claim 1, characterized by the fact that (c) the determination of each of the two mathematical haplotypes of the first plurality of loci includes: (i) identifying the alleles of the maternal genome in one or more of the first plurality loci based on the amounts of the respective alleles determined from the nucleic acid molecules in a respective locus; (ii) identify a plurality of reference haplotypes; and (iii) to compare the identified alleles of the maternal genome with the alleles in the corresponding loci of the plurality of reference haplotypes to identify the two mathematical haplotypes.
[0006]
6. Method, according to claim 5, characterized by the fact that (c) the determination of each of the two mathematical haplotypes of the first plurality of loci further includes: (iv) repeatedly comparing an identified allele of the mathematical genome with the plurality of reference haplotypes until each of the two haplotypes we kill is specifically identified.
[0007]
7. Method, according to claim 1, characterized by the fact that (b) the determination of the paternal allele inherited from the father in each of the first plurality of loci is based on the analysis of the plurality of nucleic acid molecules of a biological sample , and in which (b) determining the paternal allele inherited from the father in each of the first plurality of loci includes: (i) determining a second plurality of loci in which the fetal genome is heterozygous and the maternal genome is homozygous; (ii) determining the inherited allele from the father in each of the second plurality of loci: (1) determining relative amounts of the respective determined alleles of the nucleic acid molecules in the respective locus of the second plurality; and (2) identifying the allele having the relative minimum amount as the allele inherited at the respective locus; (iii) identifying a plurality of reference haplotypes; (iv) using the alleles inherited from the father in each of the second plurality of loci to determine which of the reference haplotypes is inherited from the father, the determined haplotype including the first plurality of loci; and (v) determining the alleles inherited from the father in the first plurality of loci of the determined haplotype to be inherited from the father.
[0008]
8. Method, according to claim 7, characterized by the fact that (iv) determining which of the reference haplotypes is inherited from the father includes: repeatedly comparing the alleles determined to be inherited from the father in each of the second plurality of loci with the alleles at the corresponding loci of the plurality of reference haplotypes until the reference haplotype inherited from the father is specifically identified.
[0009]
9. Method according to claim 7, characterized by the fact that the cutoff value is a first cutoff value, and (i) determining a specific locus as being one of the second plurality of loci in which the fetal genome is heterozygous and the maternal genome is homozygous includes: (1) determining a second cutoff value for various prognosticated counts of an allele at the specific locus, the second cutoff predicting whether the maternal genome is homozygous and the fetal genome is heterozygous, where the second cutoff value is determined based on a statistical distribution of count numbers for different combinations of homozygosity and heterozygosity at the specific locus; (2) based on the analysis of the nucleic acid molecules of the biological sample, detecting a first and a second allele allele at the specific locus; (3) determining several real counts of the first allele combase in the analysis of the plurality of nucleic acid molecules in the biological sample; and (4) determining that the fetal genome is heterozygous for the first allele and a second allele and the maternal genome is homozygous for the second allele when the number of actual counts is less than the second cutoff value.
[0010]
10. Method according to claim 9, characterized by the fact that the statistical distribution is dependent on a fractional concentration of nucleic acid molecules from the biological sample that are derived from the fetus.
[0011]
11. Method according to claim 10, characterized by the fact that the statistical distribution is also dependent on the number of the plurality of nucleic acid molecules that correspond to the specific locus.
[0012]
12. Method according to claim 1, characterized by the fact that (b) determining the paternal allele inherited from the father in each of the first plurality of loci includes: (i) determining a second plurality of loci of the paternal genome that are homozygous by analysis of the paternal genome, in which the first plurality of loci is the second plurality of loci, (ii) determining the allele of the paternal genome in each of the first plurality of loci; and (iii) designate the respective alleles in the first plurality of loci as the alleles inherited from the father.
[0013]
13. Method according to claim 1, characterized in that (a) analyzing a nucleic acid molecule includes implementing at least a portion of the nucleic acid molecules with at least one technique selected from the group consisting of massively parallel sequencing , microarray, hybridization, PCR, digital PCR, and mass spectrometry.
[0014]
14. Method, according to claim 1, characterized by the fact that it further comprises: (g) for each of a first subset of neighboring loci of the first plurality of loci, determining which haplotype is inherited by the unborn fetus from the mother to a first genomic section including the first subset of neighboring loci, in which the determination of which haplotype includes: (i) determining a first quantity of the respective determined alleles of the nucleic acid molecules which is compatible with one of the two haplotypes we killed for the first subset (ii) determining a second quantity of the respective determined alleles of the nucleic acid molecules that is compatible with the other of the two mathematical haplotypes for the first subset of consecutive loci; and (iii) determining the inherited haplotype for the first genomic section based on a comparison of the first quantity with the second quantity.
[0015]
15. Method according to claim 14, characterized by the fact that the comparison of the first quantity with the second quantity uses the sequential probability ratio test.
[0016]
16. Method, according to claim 14, characterized by the fact that the determinations of the first quantity and the second quantity are both carried out sequentially with respect to the loci of the first subset of neighboring loci.
[0017]
17. Method, according to claim 14, characterized by the fact that the first subset of neighboring loci is further divided into two subgroups, in which the first subgroup consists of loci so that the father's genotypes are compatible with the constituent genotypes of a first mother's haplotype, and the second subgroup consists of loci so that the father's genotypes are compatible with the constituent genotypes of a second mother's haplotype; and where (i) to (iii) are performed individually for the two subgroups, the method further comprising: (h) determining the inherited haplotype for the first genomic section based on the results of (iii) for these two subgroups.
[0018]
18. Method, according to claim 1, characterized by the fact that it further comprises: (g) determining that the fetus inherited a mutation from the mother: (i) analyzing the mother's haplotype that was inherited by the fetus; and (ii) identifying the mutation in the inherited haplotype.
[0019]
19. Method according to claim 1, characterized by the fact that (a) the analysis of a plurality of nucleic acid molecules in the biological sample includes: (iv) enriching the biological sample for nucleic acids in a target region of a genome; and / or (v) sequencing the nucleic acids in the target region, and where a first plurality of loci are in the target region.
[0020]
20. Method, according to claim 19, characterized by the fact that the target region is identified as containing a high number of informative loci.
[0021]
21. Method according to claim 19, characterized in that the sequencing only sequences nucleic acids in the target region.
[0022]
22. Method according to claim 1, characterized by the fact that determining which of the two haplotypes we kill is inherited by the unborn fetus of the mother in the portion of the genome covered by the first plurality of loci comprises: if the paternal alleles inherited by the fetus correspond to the alleles of a first mother's haplotype, then the fetus is determined by inheriting the first mother's haplotype if the alleles of the corresponding loci that the first haplotype contains are over-represented in comparison to alleles of the corresponding loci contained in a second mother's haplotype; dare the paternal alleles inherited by the fetus correspond to the alleles of the mother's first haplotype, so the fetus is determined by inheriting the mother's second haplotype if an equal representation of the alleles of the corresponding loci that the first and second haplotypes contain is observed.
[0023]
23. Method for determining at least a portion of the genome of an unborn fetus of a pregnant female, the fetus having a father and a mother who is the pregnant female, and the father having a paternal genome with paternal haplotypes and the mother having a maternal genome with mathematical haplotypes, characterized by the fact that it comprises: (a) analyzing a plurality of nucleic acid molecules from a biological sample obtained from the pregnant female, where the biological sample contains a mixture of mathematical and fetal nucleic acids, and in which analysis of a nucleic acid molecule includes: (i) receiving results from at least one selected gmpo technique which consists of massively parallel sequencing, microarray, hybridization, PCR, digital PCR, and mass spectrometry implemented in the nucleic acid molecule; (ii) identifying, using the results, a locus of the nucleic acid molecule in the human genome; and (iii) determine, using the results, a respective allele of the nucleic acid molecule; (b) determine a first plurality of loci of the parent genome that are heterozygous, in which the paternal genome is obtained from the father of the unborn fetus, and in that the maternal genome is homozygous in the first plurality of loci; and (c) based on the respective alleles determined in the first plurality of loci, determine, with a computer system, the haplotype that is inherited by the unborn fetus from the father in the portion of the genome covered by the first plurality of loci.
[0024]
24. Method according to claim 23, characterized by the fact that (c) the determination of the haplotype that is inherited by the unborn fetus of the father includes: (i) identifying, in the plurality of nucleic acid molecules, alleles that are present in the paternal genome in those of the first plurality of loci and absent in the maternal genome; and (ii) identify the inherited paternal haplotype as the haplotype with the identified alleles.
[0025]
25. Method, according to claim 23, characterized by the fact that it further comprises: (d) determining that the fetus inherited a mutation from the father: (i) analyzing the father's haplotype that was inherited by the fetus; and (ii) identifying the mutation in the inherited haplotype.
[0026]
26. Method for determining at least a portion of the genome of an unborn fetus of a pregnant female, the fetus having a father and mother who is the pregnant female, and the father having a paternal genome with paternal haplotypes and the mother having a maternal genome with mathematic haplotypes, characterized by the fact that it comprises: (a) determining a first plurality of paternal genome loci that are heterozygous, in which the paternal genome is obtained from the father of the unborn fetus, and in which the maternal genome, obtained from the mother of the unborn fetus, it is also heterozygous in the first plurality of loci, and in which each of two paternal haplotypes and each of two haplotypes we kill in the first plurality of loci are known; (b) determine one or more secondary loci of the paternal genomes that are heterozygous, where the maternal genome is homozygous at the secondary loci, and where the first plurality of loci and the secondary loci are on the same chromosome; (c) analyzing a plurality of nucleic acid molecules from a biological sample obtained from the pregnant female, where the biological sample contains a mixture of mathematical and fetal nucleic acids, and in which the analysis of a nucleic acid molecule includes: (i) receiving results from at least one technique selected from the group consisting of massively parallel sequencing, microarray, hybridization, PCR, digital PCR, and mass spectrometry implemented in the nucleic acid molecule, (ii) identifying, using the results, a locus of the nucleic acid molecule in the human genome; and (iii) determine, using the results, a respective allele of the nucleic acid molecule; (d) determine which of the two paternal haplotypes was inherited by the fetus by analyzing the respective alleles determined from the plurality of nucleic acid molecules of the biological sample in at least one of the secondary loci; (e) comparing, with a computer system, relative amounts of the respective determined alleles of the nucleic acid molecules in more than one locus of the first plurality of loci; and (f) based on the paternal haplotype determined to be inherited by the fetus and based on the comparison of relative quantities, determine the haplotype that is inherited by the unborn fetus from the mother in the portion of the genome covered by the first plurality of loci.
[0027]
27. Method for determining at least a portion of the genome of an unborn fetus of a pregnant female, the fetus having a father and a mother who is the pregnant female, and the father having a paternal genome with paternal haplotypes and the mother having a maternal genome with mathematical haplotypes, characterized by the fact that it comprises: (a) analyzing a plurality of nucleic acid molecules from a biological sample obtained from the pregnant female, where the biological sample contains a mixture of maternal and fetal nucleic acids, and in which the analysis of a nucleic acid molecule includes: (i) receiving results from at least one technique selected from the group consisting of massively parallel sequencing, microarray, hybridization, PCR, digital PCR, and mass spectrometry implemented in the nucleic acid molecule; ( ii) identify, using the results, a locus of the nucleic acid molecule in the human genome; and (iii) determining, using the results, a respective allele of the nucleic acid molecule; (b) determining a first plurality of loci in which the fetal genome is heterozygous and the maternal genome is homozygous; (c) determining, with a system computer, an allele inherited from the father in each of the first plurality of loci: (i) determining the relative amounts of the respective determined alleles of the nucleic acid molecules in the respective locus of the first plurality; and (ii) identifying the allele having the minimum relative quantity as being the allele inherited in the respective locus; (d) identifying a plurality of reference haplotypes; and (e) use the alleles inherited from the father in each of the first plurality of loci to determine which of the reference haplotypes is inherited from the father in the portion of the genome covered by the first plurality of loci.
[0028]
28. Method according to claim 27, characterized by the fact that (e) determining which reference haplotypes are inherited from the father includes: repeatedly comparing the alleles determined to be inherited from the father in each of the first plurality of loci with the alleles at the corresponding loci of the plurality of reference haplotypes until the reference haplotype inherited from the father is specifically identified.
[0029]
29. Method according to claim 27, characterized by the fact that (b) the determination of a specific locus as being one of the first plurality of loci in which the fetal genome is heterozygous and the maternal genome is homozygous includes: (i ) determine a cut-off value for various prognosticated counts of an allele at the specific locus, the cut-off value predicting whether the maternal genome is homozygous and the fetal genome is heterozygous, where the cut-off value is determined based on a statistical distribution of count numbers for different combinations of homozygosity and heterozygosity at the specific locus; (ii) based on the analysis of the nucleic acid molecules in the biological sample, detecting a first allele and a second allele at the specific locus; (iii) determining a number of counts realities of a first allele based on sequencing the plurality of nucleic acid molecules in the biological sample; and (iv) determine whether the fetal genome is heterozygous for the first allele and a second allele and the maternal genome is homozygous for the second allele when the number of actual counts is less than the cutoff value.
[0030]
30. Non-transitory computer-readable medium, characterized by the fact that it stores a plurality of instructions that, when executed, control a computer system to perform any of the methods as defined in claims 1 to 12, 14 to 18 and 22 to 29.
类似技术:
公开号 | 公开日 | 专利标题
US20180282807A1|2018-10-04|Identifying a de novo fetal mutation from a maternal biological sample
JP6001721B2|2016-10-05|Genome analysis based on size
AU2013203446B2|2015-05-14|Identifying a de novo fetal mutation from a maternal biological sample
AU2015200462B2|2016-11-24|Size-based genomic analysis
同族专利:
公开号 | 公开日
EP2496717B1|2017-06-07|
JP6386494B2|2018-09-05|
IL219521A|2015-02-26|
BR112012010694B8|2021-07-27|
RS58879B1|2019-08-30|
EP3498863A1|2019-06-19|
HUE043574T2|2019-08-28|
IL219521D0|2012-06-28|
JP6023117B2|2016-11-09|
MX2012005214A|2012-09-21|
DK3241914T3|2019-04-23|
EP3241914B1|2019-03-06|
EP3783110A1|2021-02-24|
CN105779280B|2018-09-25|
AU2010315037A1|2012-06-07|
TW201122470A|2011-07-01|
US9512480B2|2016-12-06|
US10093976B2|2018-10-09|
EP3498863B1|2020-11-04|
US8467976B2|2013-06-18|
MX355132B|2018-04-06|
BR112012010694A2|2018-09-11|
SI3241914T1|2019-07-31|
JP2016195598A|2016-11-24|
US20180282807A1|2018-10-04|
AU2010315037A8|2012-07-12|
AU2010315037B2|2014-09-18|
PL2496717T3|2017-11-30|
JP2014193165A|2014-10-09|
HRP20190601T1|2019-10-04|
EP3241914A1|2017-11-08|
EP2496717A1|2012-09-12|
DK2496717T3|2017-07-24|
PT2496717T|2017-07-12|
CN105779280A|2016-07-20|
LT3241914T|2019-04-25|
PT3241914T|2019-04-30|
CN102770558A|2012-11-07|
IL237175A|2017-05-29|
TWI458976B|2014-11-01|
IL237179A|2016-05-31|
US20130253844A1|2013-09-26|
CA2779695C|2016-05-24|
CN102770558B|2016-04-06|
ES2720282T3|2019-07-19|
JP2013509884A|2013-03-21|
JP5540105B2|2014-07-02|
US20110105353A1|2011-05-05|
HK1222413A1|2017-06-30|
HUE034854T2|2018-03-28|
PL3241914T3|2019-08-30|
EA033752B1|2019-11-21|
WO2011057094A1|2011-05-12|
AU2010315037B9|2015-04-23|
US20130323731A1|2013-12-05|
CA2779695A1|2011-05-12|
EA201200690A1|2013-05-30|
ES2628874T3|2017-08-04|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

GB9704444D0|1997-03-04|1997-04-23|Isis Innovation|Non-invasive prenatal diagnosis|
US6664056B2|2000-10-17|2003-12-16|The Chinese University Of Hong Kong|Non-invasive prenatal monitoring|
US6927028B2|2001-08-31|2005-08-09|Chinese University Of Hong Kong|Non-invasive methods for detecting non-host DNA in a host using epigenetic differences between the host and non-host DNA|
US20070178478A1|2002-05-08|2007-08-02|Dhallan Ravinder S|Methods for detection of genetic disorders|
US6977162B2|2002-03-01|2005-12-20|Ravgen, Inc.|Rapid analysis of variations in a genome|
US7727720B2|2002-05-08|2010-06-01|Ravgen, Inc.|Methods for detection of genetic disorders|
JP2006508632A|2002-03-01|2006-03-16|ラブジェン,インコーポレイテッド|Methods for detecting genetic diseases|
AU2003268333A1|2003-02-28|2004-09-28|Ravgen, Inc.|Methods for detection of genetic disorders|
RU2200761C1|2002-04-01|2003-03-20|Московский НИИ педиатрии и детской хирургии|Set of recombinant plasmid dna pyai 11-19, pyai 2-45, pys 37 and pyai 7-29 for determination of origin of human accessory or marker chromosomes|
WO2004078999A1|2003-03-05|2004-09-16|Genetic Technologies Limited|Identification of fetal dna and fetal cell markers in maternal plasma or serum|
EP1524321B2|2003-10-16|2014-07-23|Sequenom, Inc.|Non-invasive detection of fetal genetic traits|
EP1859050B1|2005-03-18|2012-10-24|The Chinese University Of Hong Kong|A method for the detection of chromosomal aneuploidies|
US20070122823A1|2005-09-01|2007-05-31|Bianchi Diana W|Amniotic fluid cell-free fetal DNA fragment size pattern for prenatal diagnosis|
GB0523276D0|2005-11-15|2005-12-21|London Bridge Fertility|Chromosomal analysis by molecular karyotyping|
LT3002338T|2006-02-02|2019-10-25|Univ Leland Stanford Junior|Non-invasive fetal genetic screening by digital analysis|
WO2007100911A2|2006-02-28|2007-09-07|University Of Louisville Research Foundation|Detecting fetal chromosomal abnormalities using tandem single nucleotide polymorphisms|
US20100112590A1|2007-07-23|2010-05-06|The Chinese University Of Hong Kong|Diagnosing Fetal Chromosomal Aneuploidy Using Genomic Sequencing With Enrichment|
EA017966B1|2007-07-23|2013-04-30|Те Чайниз Юниверсити Ов Гонгконг|Diagnosing fetal chromosomal aneuploidy using genomic sequencing|
CA2731991C|2008-08-04|2021-06-08|Gene Security Network, Inc.|Methods for allele calling and ploidy calling|
SG172345A1|2008-12-22|2011-07-28|Celula Inc|Methods and genotyping panels for detecting alleles, genomes, and transcriptomes|
JP5540105B2|2009-11-05|2014-07-02|ザチャイニーズユニバーシティオブホンコン|Fetal genome analysis of maternal biological samples|
JP5770737B2|2009-11-06|2015-08-26|ザ チャイニーズ ユニバーシティ オブ ホンコン|Genome analysis based on size|
US9260745B2|2010-01-19|2016-02-16|Verinata Health, Inc.|Detecting and classifying copy number variation|
AU2011255641A1|2010-05-18|2012-12-06|Natera, Inc.|Methods for non-invasive prenatal ploidy calling|
US20120190021A1|2011-01-25|2012-07-26|Aria Diagnostics, Inc.|Detection of genetic abnormalities|
CN106011237B|2011-02-24|2019-12-13|香港中文大学|Molecular testing of multiple pregnancies|
EP2971126B1|2013-03-15|2018-11-07|The Chinese University Of Hong Kong|Determining fetal genomes for multiple fetus pregnancies|GB2257707B|1991-05-27|1995-11-01|Nippon Zeon Co|Adhesive composition|
US11111543B2|2005-07-29|2021-09-07|Natera, Inc.|System and method for cleaning noisy genetic data and determining chromosome copy number|
US11111544B2|2005-07-29|2021-09-07|Natera, Inc.|System and method for cleaning noisy genetic data and determining chromosome copy number|
US8532930B2|2005-11-26|2013-09-10|Natera, Inc.|Method for determining the number of copies of a chromosome in the genome of a target individual using genetic data from genetically related individuals|
US10083273B2|2005-07-29|2018-09-25|Natera, Inc.|System and method for cleaning noisy genetic data and determining chromosome copy number|
US10081839B2|2005-07-29|2018-09-25|Natera, Inc|System and method for cleaning noisy genetic data and determining chromosome copy number|
US9424392B2|2005-11-26|2016-08-23|Natera, Inc.|System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals|
WO2007100911A2|2006-02-28|2007-09-07|University Of Louisville Research Foundation|Detecting fetal chromosomal abnormalities using tandem single nucleotide polymorphisms|
US8609338B2|2006-02-28|2013-12-17|University Of Louisville Research Foundation, Inc.|Detecting fetal chromosomal abnormalities using tandem single nucleotide polymorphisms|
EP2029779A4|2006-06-14|2010-01-20|Living Microsystems Inc|Use of highly parallel snp genotyping for fetal diagnosis|
US20080050739A1|2006-06-14|2008-02-28|Roland Stoughton|Diagnosis of fetal abnormalities using polymorphisms including short tandem repeats|
US9524369B2|2009-06-15|2016-12-20|Complete Genomics, Inc.|Processing and analysis of complex nucleic acid sequence data|
WO2009105531A1|2008-02-19|2009-08-27|Gene Security Network, Inc.|Methods for cell genotyping|
EP2271772B1|2008-03-11|2014-07-16|Sequenom, Inc.|Nucleic acid-based tests for prenatal gender determination|
WO2009146335A1|2008-05-27|2009-12-03|Gene Security Network, Inc.|Methods for embryo characterization and comparison|
CA2731991C|2008-08-04|2021-06-08|Gene Security Network, Inc.|Methods for allele calling and ploidy calling|
US8476013B2|2008-09-16|2013-07-02|Sequenom, Inc.|Processes and compositions for methylation-based acid enrichment of fetal nucleic acid from a maternal sample useful for non-invasive prenatal diagnoses|
US8962247B2|2008-09-16|2015-02-24|Sequenom, Inc.|Processes and compositions for methylation-based enrichment of fetal nucleic acid from a maternal sample useful for non invasive prenatal diagnoses|
WO2013130848A1|2012-02-29|2013-09-06|Natera, Inc.|Informatics enhanced analysis of fetal samples subject to maternal contamination|
US10316362B2|2010-05-18|2019-06-11|Natera, Inc.|Methods for simultaneous amplification of target loci|
US20190010543A1|2010-05-18|2019-01-10|Natera, Inc.|Methods for simultaneous amplification of target loci|
EP2473638B1|2009-09-30|2017-08-09|Natera, Inc.|Methods for non-invasive prenatal ploidy calling|
JP5540105B2|2009-11-05|2014-07-02|ザチャイニーズユニバーシティオブホンコン|Fetal genome analysis of maternal biological samples|
EP2504448B1|2009-11-25|2016-10-19|Bio-Rad Laboratories, Inc.|Methods and compositions for detecting genetic material|
US9926593B2|2009-12-22|2018-03-27|Sequenom, Inc.|Processes and kits for identifying aneuploidy|
US9260745B2|2010-01-19|2016-02-16|Verinata Health, Inc.|Detecting and classifying copy number variation|
WO2011090556A1|2010-01-19|2011-07-28|Verinata Health, Inc.|Methods for determining fraction of fetal nucleic acid in maternal samples|
EP2526415B1|2010-01-19|2017-05-03|Verinata Health, Inc|Partition defined detection methods|
ES2534986T3|2010-01-19|2015-05-04|Verinata Health, Inc|Simultaneous determination of aneuploidy and fetal fraction|
US10388403B2|2010-01-19|2019-08-20|Verinata Health, Inc.|Analyzing copy number variation in the detection of cancer|
EP2513341B1|2010-01-19|2017-04-12|Verinata Health, Inc|Identification of polymorphic sequences in mixtures of genomic dna by whole genome sequencing|
US9323888B2|2010-01-19|2016-04-26|Verinata Health, Inc.|Detecting and classifying copy number variation|
US9411937B2|2011-04-15|2016-08-09|Verinata Health, Inc.|Detecting and classifying copy number variation|
US20110312503A1|2010-01-23|2011-12-22|Artemis Health, Inc.|Methods of fetal abnormality detection|
RU2620959C2|2010-12-22|2017-05-30|Натера, Инк.|Methods of noninvasive prenatal paternity determination|
CA2824387C|2011-02-09|2019-09-24|Natera, Inc.|Methods for non-invasive prenatal ploidy calling|
AU2011255641A1|2010-05-18|2012-12-06|Natera, Inc.|Methods for non-invasive prenatal ploidy calling|
SG186787A1|2010-07-23|2013-02-28|Esoterix Genetic Lab Llc|Identification of differentially represented fetal or maternal genomic regions and uses thereof|
US11203786B2|2010-08-06|2021-12-21|Ariosa Diagnostics, Inc.|Detection of target nucleic acids using hybridization|
US20120034603A1|2010-08-06|2012-02-09|Tandem Diagnostics, Inc.|Ligation-based detection of genetic variants|
US10533223B2|2010-08-06|2020-01-14|Ariosa Diagnostics, Inc.|Detection of target nucleic acids using hybridization|
US20140342940A1|2011-01-25|2014-11-20|Ariosa Diagnostics, Inc.|Detection of Target Nucleic Acids using Hybridization|
US11031095B2|2010-08-06|2021-06-08|Ariosa Diagnostics, Inc.|Assay systems for determination of fetal copy number variation|
US20130261003A1|2010-08-06|2013-10-03|Ariosa Diagnostics, In.|Ligation-based detection of genetic variants|
US10167508B2|2010-08-06|2019-01-01|Ariosa Diagnostics, Inc.|Detection of genetic abnormalities|
CN103403182B|2010-11-30|2015-11-25|香港中文大学|The heredity relevant to cancer or the detection of molecular distortion|
US8877442B2|2010-12-07|2014-11-04|The Board Of Trustees Of The Leland Stanford Junior University|Non-invasive determination of fetal inheritance of parental haplotypes at the genome-wide scale|
US11270781B2|2011-01-25|2022-03-08|Ariosa Diagnostics, Inc.|Statistical analysis for non-invasive sex chromosome aneuploidy determination|
US10131947B2|2011-01-25|2018-11-20|Ariosa Diagnostics, Inc.|Noninvasive detection of fetal aneuploidy in egg donor pregnancies|
US20120190021A1|2011-01-25|2012-07-26|Aria Diagnostics, Inc.|Detection of genetic abnormalities|
US8756020B2|2011-01-25|2014-06-17|Ariosa Diagnostics, Inc.|Enhanced risk probabilities using biomolecule estimations|
US8700338B2|2011-01-25|2014-04-15|Ariosa Diagnosis, Inc.|Risk calculation for evaluation of fetal aneuploidy|
CA2826748C|2011-02-09|2020-08-04|Bio-Rad Laboratories, Inc.|Method of detecting variations in copy number of a target nucleic acid|
CN106011237B|2011-02-24|2019-12-13|香港中文大学|Molecular testing of multiple pregnancies|
LT3078752T|2011-04-12|2018-11-26|Verinata Health, Inc.|Resolving genome fractions using polymorphism counts|
GB2484764B|2011-04-14|2012-09-05|Verinata Health Inc|Normalizing chromosomes for the determination and verification of common and rare chromosomal aneuploidies|
WO2012177792A2|2011-06-24|2012-12-27|Sequenom, Inc.|Methods and processes for non-invasive assessment of a genetic variation|
US20130040375A1|2011-08-08|2013-02-14|Tandem Diagnotics, Inc.|Assay systems for genetic analysis|
US9679103B2|2011-08-25|2017-06-13|Complete Genomics, Inc.|Phasing of heterozygous loci to determine genomic haplotypes|
US8712697B2|2011-09-07|2014-04-29|Ariosa Diagnostics, Inc.|Determination of copy number variations using binomial probability calculations|
US9367663B2|2011-10-06|2016-06-14|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
US20140242588A1|2011-10-06|2014-08-28|Sequenom, Inc|Methods and processes for non-invasive assessment of genetic variations|
US9984198B2|2011-10-06|2018-05-29|Sequenom, Inc.|Reducing sequence read count error in assessment of complex genetic variations|
US10424394B2|2011-10-06|2019-09-24|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
US10196681B2|2011-10-06|2019-02-05|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
EP3401399B1|2012-03-02|2020-04-22|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
US9892230B2|2012-03-08|2018-02-13|The Chinese University Of Hong Kong|Size-based analysis of fetal or tumor DNA fraction in plasma|
WO2013138527A1|2012-03-13|2013-09-19|The Chinese University Of Hong Kong|Methods for analyzing massively parallel sequencing data for noninvasive prenatal diagnosis|
US9238836B2|2012-03-30|2016-01-19|Pacific Biosciences Of California, Inc.|Methods and compositions for sequencing modified nucleic acids|
RU2597981C2|2012-05-14|2016-09-20|БГИ Диагносис Ко., Лтд.|Method and system for determining nucleotide sequence in given region of foetal genome|
US9920361B2|2012-05-21|2018-03-20|Sequenom, Inc.|Methods and compositions for analyzing nucleic acid|
US10289800B2|2012-05-21|2019-05-14|Ariosa Diagnostics, Inc.|Processes for calculating phased fetal genomic sequences|
WO2013177581A2|2012-05-24|2013-11-28|University Of Washington Through Its Center For Commercialization|Whole genome sequencing of a human fetus|
US11261494B2|2012-06-21|2022-03-01|The Chinese University Of Hong Kong|Method of measuring a fractional concentration of tumor DNA|
US10497461B2|2012-06-22|2019-12-03|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
WO2014012107A2|2012-07-13|2014-01-16|Life Technologies Corporation|Human identifiation using a panel of snps|
EP2872648B1|2012-07-13|2019-09-04|Sequenom, Inc.|Processes and compositions for methylation-based enrichment of fetal nucleic acid from a maternal sample useful for non-invasive prenatal diagnoses|
CA2878280A1|2012-07-19|2014-01-23|Ariosa Diagnostics, Inc.|Multiplexed sequential ligation-based detection of genetic variants|
US20140065621A1|2012-09-04|2014-03-06|Natera, Inc.|Methods for increasing fetal fraction in maternal blood|
EP2893478A1|2012-09-06|2015-07-15|Ancestry.com DNA LLC|Using haplotypes to infer ancestral origins for recently admixed individuals|
US10706957B2|2012-09-20|2020-07-07|The Chinese University Of Hong Kong|Non-invasive determination of methylome of tumor from plasma|
DK2898100T3|2012-09-20|2018-02-26|Univ Hong Kong Chinese|NON-INVASIVE DETERMINATION OF A FOSTER METHYLOM OR PLASMA TUMOR|
US9732390B2|2012-09-20|2017-08-15|The Chinese University Of Hong Kong|Non-invasive determination of methylome of fetus or tumor from plasma|
US10482994B2|2012-10-04|2019-11-19|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
US9213947B1|2012-11-08|2015-12-15|23Andme, Inc.|Scalable pipeline for local ancestry inference|
US9367800B1|2012-11-08|2016-06-14|23Andme, Inc.|Ancestry painting with local ancestry inference|
US10504613B2|2012-12-20|2019-12-10|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
US10643738B2|2013-01-10|2020-05-05|The Chinese University Of Hong Kong|Noninvasive prenatal molecular karyotyping from maternal plasma|
US20130309666A1|2013-01-25|2013-11-21|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
EP3587588B1|2013-02-28|2021-06-30|The Chinese University Of Hong Kong|Maternal plasma transcriptome analysis by massively parallel rna sequencing|
US9994897B2|2013-03-08|2018-06-12|Ariosa Diagnostics, Inc.|Non-invasive fetal sex determination|
EP2971100A1|2013-03-13|2016-01-20|Sequenom, Inc.|Primers for dna methylation analysis|
EP2971126B1|2013-03-15|2018-11-07|The Chinese University Of Hong Kong|Determining fetal genomes for multiple fetus pregnancies|
EP2981921A1|2013-04-03|2016-02-10|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
CA2909479A1|2013-05-09|2014-11-13|F. Hoffmann-La Roche Ag|Method of determining the fraction of fetal dna in maternal blood using hla markers|
CN112575075A|2013-05-24|2021-03-30|塞昆纳姆股份有限公司|Methods and processes for non-invasive assessment of genetic variation|
DK3011051T3|2013-06-21|2019-04-23|Sequenom Inc|Method for non-invasive evaluation of genetic variations|
US20150004601A1|2013-06-28|2015-01-01|Ariosa Diagnostics, Inc.|Massively parallel sequencing of random dna fragments for determination of fetal fraction|
WO2015048535A1|2013-09-27|2015-04-02|Natera, Inc.|Prenatal diagnostic resting standards|
US10577655B2|2013-09-27|2020-03-03|Natera, Inc.|Cell free DNA diagnostic testing standards|
JP6525434B2|2013-10-04|2019-06-05|セクエノム, インコーポレイテッド|Methods and processes for non-invasive assessment of gene mutations|
GB201318369D0|2013-10-17|2013-12-04|Univ Leuven Kath|Methods using BAF|
US10262755B2|2014-04-21|2019-04-16|Natera, Inc.|Detecting cancer mutations and aneuploidy in chromosomal segments|
US10179937B2|2014-04-21|2019-01-15|Natera, Inc.|Detecting mutations and ploidy in chromosomal segments|
US9677118B2|2014-04-21|2017-06-13|Natera, Inc.|Methods for simultaneous amplification of target loci|
AU2015289414B2|2014-07-18|2021-07-08|Illumina, Inc.|Non-invasive prenatal diagnosis of fetal genetic condition using cellular DNA and cell free DNA|
US20160026759A1|2014-07-22|2016-01-28|Yourgene Bioscience|Detecting Chromosomal Aneuploidy|
CN104182655B|2014-09-01|2017-03-08|上海美吉生物医药科技有限公司|A kind of method for judging fetus genotype|
CN104232778B|2014-09-19|2016-08-17|天津华大基因科技有限公司|Determine the method and device of fetus haplotype and chromosomal aneuploidy simultaneously|
CN105648045B|2014-11-13|2019-10-11|天津华大基因科技有限公司|The method and apparatus for determining fetus target area haplotype|
CN105648044B|2014-11-13|2019-10-11|天津华大基因科技有限公司|The method and apparatus for determining fetus target area haplotype|
CN104561309B|2015-01-04|2017-04-19|北京积水潭医院|Kit for predicting birth safety before people-assisted reproduction blastosphere implantation|
CN104561311B|2015-01-04|2016-08-17|北京大学第三医院|A kind of test kit of safety prediction of being born in early days from people's supplementary reproduction fetal development|
US10364467B2|2015-01-13|2019-07-30|The Chinese University Of Hong Kong|Using size and number aberrations in plasma DNA for detecting cancer|
WO2016112539A1|2015-01-16|2016-07-21|深圳华大基因股份有限公司|Method and device for determining fetal nucleic acid content|
EP3967775A1|2015-07-23|2022-03-16|The Chinese University Of Hong Kong|Analysis of fragmentation patterns of cell-free dna|
KR20170125044A|2015-02-10|2017-11-13|더 차이니즈 유니버시티 오브 홍콩|Mutation detection for cancer screening and fetal analysis|
CN106021992A|2015-03-27|2016-10-12|知源生信公司(美国硅谷)|Computation pipeline of location-dependent variant calls|
CA2986036A1|2015-05-18|2016-11-24|Karius, Inc.|Compositions and methods for enriching populations of nucleic acids|
EP3317420B1|2015-07-02|2021-10-20|Arima Genomics, Inc.|Accurate molecular deconvolution of mixtures samples|
EP3325663B1|2015-07-20|2020-08-19|The Chinese University Of Hong Kong|Methylation pattern analysis of haplotypes in tissues in dna mixture|
SG10202107693TA|2015-09-22|2021-09-29|Univ Hong Kong Chinese|Accurate quantification of fetal dna fraction by shallow-depth sequencing of maternal plasma dna|
GB201518665D0|2015-10-21|2015-12-02|Singapore Volition Pte Ltd|Method for enrichment of cell free nucleosomes|
CN105335625B|2015-11-04|2018-02-16|和卓生物科技(上海)有限公司|Science of heredity detection means before Embryonic limb bud cell|
CN105926043B|2016-04-19|2018-08-28|苏州贝康医疗器械有限公司|A method of improving fetus dissociative DNA accounting in pregnant woman blood plasma dissociative DNA sequencing library|
US11200963B2|2016-07-27|2021-12-14|Sequenom, Inc.|Genetic copy number alteration classifications|
AU2018212272A1|2017-01-25|2019-07-18|Grail, Inc.|Diagnostic applications using nucleic acid fragments|
CN109996894A|2016-11-18|2019-07-09|香港中文大学|The antenatal test of the Noninvasive based on general haplotype for single-gene disorder|
US10011870B2|2016-12-07|2018-07-03|Natera, Inc.|Compositions and methods for identifying nucleic acid molecules|
WO2018156418A1|2017-02-21|2018-08-30|Natera, Inc.|Compositions, methods, and kits for isolating nucleic acids|
EP3602359A4|2017-03-24|2021-01-06|Myriad Women's Health, Inc.|Copy number variant caller|
WO2018209222A1|2017-05-12|2018-11-15|Massachusetts Institute Of Technology|Systems and methods for crowdsourcing, analyzing, and/or matching personal data|
WO2019010410A1|2017-07-07|2019-01-10|Massachusetts Institute Of Technology|Systems and methods for genetic identification and analysis|
CN109280697A|2017-07-20|2019-01-29|天昊生物医药科技(苏州)有限公司|The method for carrying out fetus genotype identification using pregnant woman blood plasma dissociative DNA|
CN107545153B|2017-10-25|2021-06-11|桂林电子科技大学|Nucleosome classification prediction method based on convolutional neural network|
WO2019200228A1|2018-04-14|2019-10-17|Natera, Inc.|Methods for cancer detection and monitoring by means of personalized detection of circulating tumor dna|
WO2020049558A1|2018-09-03|2020-03-12|Ramot At Tel-Aviv University Ltd.|Method and system for identifying gene disorder in maternal blood|
EP3899030A2|2018-12-17|2021-10-27|Natera, Inc.|Methods for analysis of circulating cells|
CA3137130A1|2019-04-22|2020-10-29|Personal Genome Diagnostics Inc.|Methods and systems for genetic analysis|
US11091794B2|2019-08-16|2021-08-17|The Chinese University Of Hong Kong|Determination of base modifications of nucleic acids|
CN112466397A|2019-09-09|2021-03-09|深圳乐土生物科技有限公司|Method and device for detecting genetic relationship|
CN111312332B|2020-02-13|2020-10-30|国家卫生健康委科学技术研究所|Biological information processing method and device based on HLA genes and terminal|
法律状态:
2019-07-30| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2020-04-28| B07A| Technical examination (opinion): publication of technical examination (opinion) [chapter 7.1 patent gazette]|
2020-09-15| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2020-11-17| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 10 (DEZ) ANOS CONTADOS A PARTIR DE 17/11/2020, OBSERVADAS AS CONDICOES LEGAIS. |
2021-07-27| B16C| Correction of notification of the grant|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 05/11/2010, OBSERVADAS AS CONDICOES LEGAIS. PATENTE CONCEDIDA CONFORME ADI 5.529/DF, QUE DETERMINA A ALTERACAO DO PRAZO DE CONCESSAO |
优先权:
申请号 | 申请日 | 专利标题
US25856709P| true| 2009-11-05|2009-11-05|
US61/258567|2009-11-05|
US25907509P| true| 2009-11-06|2009-11-06|
US61/259075|2009-11-06|
US38185410P| true| 2010-09-10|2010-09-10|
US61/381854|2010-09-10|
PCT/US2010/055655|WO2011057094A1|2009-11-05|2010-11-05|Fetal genomic analysis from a maternal biological sample|
[返回顶部]